# Chapter 4:
- ### File input/output
- ### Exceptions, Assertions
- ### Graphs using ```matplotlib``` and ```seaborn```

To work with files, e.g. reading, writing them, it is best to work with 'context managers' i.e. use ```with``` statements as these will automatically close file once the indented code is run. This clears memory, etc.



# Creating/Writing files:

Let's create some random string an **write** it to a file.

```
s='''10, 20, 30, 40, 10, 20, 34. 
The above is some random data.
This is the third line of the string.
'''
with open(my_filename, 'w') as my_file:  #my_file can be replaced with any 'alias'
    my_file.write(s)
    
```



In [1]:
s='''10, 20, 30, 40, 10, 20, 34. 
The above is some random data.
This is the third line of the string.
'''
with open('my_filename', 'w') as my_file:  #my_file can be replaced with any 'alias'
    my_file.write(s)

In [5]:
#we see we have created a file named my_filename is local directory
!ls

#The exclamation runs an terminal command in the python notebook 

2521_Syllabus.md                chapter3notes.ipynb
PythonNotesForProfessionals.pdf chapter3notes_day2.ipynb
README.md                       chapter4notes.ipynb
chapter1notes.ipynb             my_filename
chapter2notes.ipynb


In [4]:
! cat my_filename

10, 20, 30, 40, 10, 20, 34. 
The above is some random data.
This is the third line of the string.


In [8]:
# now let's open the file to read it
with open('my_filename', 'r') as f:
    print(f.read())

10, 20, 30, 40, 10, 20, 34. 
The above is some random data.
This is the third line of the string.



### Reading files

In [14]:
# now let's open the file to read it
with open('my_filename', 'r') as f:  #'r' for read
    print(f.readline())

10, 20, 30, 40, 10, 20, 34. 



In [10]:
f.read?

[0;31mSignature:[0m [0mf[0m[0;34m.[0m[0mread[0m[0;34m([0m[0msize[0m[0;34m=[0m[0;34m-[0m[0;36m1[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Read at most n characters from stream.

Read from underlying buffer until we have n characters or we hit EOF.
If n is negative or omitted, read until EOF.
[0;31mType:[0m      builtin_function_or_method

In [15]:
# now let's open the file to read it
with open('my_filename', 'r') as f:  #'r' for read
    print(f.read(10))

10, 20, 30


In [18]:
# now let's open the file to read it
with open('my_filename', 'r') as f:  #'r' for read
    print(f.readline(3))

10,


In [22]:
f.readline?

[0;31mSignature:[0m [0mf[0m[0;34m.[0m[0mreadline[0m[0;34m([0m[0msize[0m[0;34m=[0m[0;34m-[0m[0;36m1[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Read until newline or EOF.

Returns an empty string if EOF is hit immediately.
[0;31mType:[0m      builtin_function_or_method

In [24]:
# now let's open the file to read it
with open('my_filename', 'r') as f:  #'r' for read
    print(f.readlines(2))

['10, 20, 30, 40, 10, 20, 34. \n']


### Appending (i.e. adding) to already written files

In [25]:
# now let's open the file and APPEND it
with open('my_filename', 'a') as f:  #'a' for append
    print(f.write('This is additional line'))

23


In [26]:
!cat my_filename

10, 20, 30, 40, 10, 20, 34. 
The above is some random data.
This is the third line of the string.
This is additional line

### Defensive programming

```Assert ``` statement

Often used to handle unexpected input to a function.


```
assert (condition), (error message if any)```

In [27]:
import math

def square_root(x):
    assert x >=0
    return math.sqrt(x)

In [29]:
square_root(9)

3.0

In [30]:
square_root(-9)

AssertionError: 

In [32]:
def square_root2(x):
    assert x >=0, 'Can\'t take square root of negative nuber'
    return math.sqrt(x)

In [33]:
square_root2(-9)

AssertionError: Can't take square root of negative nuber

# Python Libraries 

 - numpy for numerical python, mostly matrices, random numbers, math functions
 - pandas for working with tabular data like spreadsheets
 - matplotlib and seaborn for  graphing

These are the almost universal abbreviations used to access methods/functions in these libraries
```
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
 ```

In [34]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

ModuleNotFoundError: No module named 'numpy'

In [36]:
!pip install numpy

Collecting numpy
  Obtaining dependency information for numpy from https://files.pythonhosted.org/packages/e3/7e/0b072c21f4feefb2d89600956af307db29fb7df695cbe6e145de91643155/numpy-1.25.1-cp311-cp311-macosx_10_9_x86_64.whl.metadata
  Downloading numpy-1.25.1-cp311-cp311-macosx_10_9_x86_64.whl.metadata (5.6 kB)
Using cached numpy-1.25.1-cp311-cp311-macosx_10_9_x86_64.whl (20.0 MB)
Installing collected packages: numpy
Successfully installed numpy-1.25.1

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m


In [37]:
! pip install matplotlib

Collecting matplotlib
  Obtaining dependency information for matplotlib from https://files.pythonhosted.org/packages/7e/2c/1e25437f4419f2828bbd213be42c8fd23a3b795c5c4bb776987d177fc615/matplotlib-3.7.2-cp311-cp311-macosx_10_12_x86_64.whl.metadata
  Downloading matplotlib-3.7.2-cp311-cp311-macosx_10_12_x86_64.whl.metadata (5.6 kB)
Collecting contourpy>=1.0.1 (from matplotlib)
  Obtaining dependency information for contourpy>=1.0.1 from https://files.pythonhosted.org/packages/92/4d/fe7a7098d98a8889252105193f5e869532f9dc37e39d917a82d2b0b874a1/contourpy-1.1.0-cp311-cp311-macosx_10_9_x86_64.whl.metadata
  Downloading contourpy-1.1.0-cp311-cp311-macosx_10_9_x86_64.whl.metadata (5.7 kB)
Collecting cycler>=0.10 (from matplotlib)
  Using cached cycler-0.11.0-py3-none-any.whl (6.4 kB)
Collecting fonttools>=4.22.0 (from matplotlib)
  Obtaining dependency information for fonttools>=4.22.0 from https://files.pythonhosted.org/packages/17/f6/cca053ae27f9ff307b2c5f6f4d4c40a2e69bee4603fa566060c81a4ba7ee

In [38]:
!pip install seaborn

Collecting seaborn
  Using cached seaborn-0.12.2-py3-none-any.whl (293 kB)
Collecting pandas>=0.25 (from seaborn)
  Obtaining dependency information for pandas>=0.25 from https://files.pythonhosted.org/packages/b3/92/a5e5133421b49e901a12e02a6a7ef3a0130e10d13db8cb657fdd0cba3b90/pandas-2.0.3-cp311-cp311-macosx_10_9_x86_64.whl.metadata
  Downloading pandas-2.0.3-cp311-cp311-macosx_10_9_x86_64.whl.metadata (18 kB)
Collecting pytz>=2020.1 (from pandas>=0.25->seaborn)
  Using cached pytz-2023.3-py2.py3-none-any.whl (502 kB)
Collecting tzdata>=2022.1 (from pandas>=0.25->seaborn)
  Using cached tzdata-2023.3-py2.py3-none-any.whl (341 kB)
Using cached pandas-2.0.3-cp311-cp311-macosx_10_9_x86_64.whl (11.6 MB)
Installing collected packages: pytz, tzdata, pandas, seaborn
Successfully installed pandas-2.0.3 pytz-2023.3 seaborn-0.12.2 tzdata-2023.3

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[

In [40]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [41]:
dir(np)

['ALLOW_THREADS',
 'BUFSIZE',
 'CLIP',
 'DataSource',
 'ERR_CALL',
 'ERR_DEFAULT',
 'ERR_IGNORE',
 'ERR_LOG',
 'ERR_PRINT',
 'ERR_RAISE',
 'ERR_WARN',
 'FLOATING_POINT_SUPPORT',
 'FPE_DIVIDEBYZERO',
 'FPE_INVALID',
 'FPE_OVERFLOW',
 'FPE_UNDERFLOW',
 'False_',
 'Inf',
 'Infinity',
 'MAXDIMS',
 'MAY_SHARE_BOUNDS',
 'MAY_SHARE_EXACT',
 'NAN',
 'NINF',
 'NZERO',
 'NaN',
 'PINF',
 'PZERO',
 'RAISE',
 'SHIFT_DIVIDEBYZERO',
 'SHIFT_INVALID',
 'SHIFT_OVERFLOW',
 'SHIFT_UNDERFLOW',
 'ScalarType',
 'True_',
 'UFUNC_BUFSIZE_DEFAULT',
 'UFUNC_PYVALS_NAME',
 'WRAP',
 '_CopyMode',
 '_NoValue',
 '_UFUNC_API',
 '__NUMPY_SETUP__',
 '__all__',
 '__builtins__',
 '__cached__',
 '__config__',
 '__deprecated_attrs__',
 '__dir__',
 '__doc__',
 '__expired_functions__',
 '__file__',
 '__former_attrs__',
 '__future_scalars__',
 '__getattr__',
 '__git_version__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '_add_newdoc_ufunc',
 '_builtins',
 '_distributor_init',
 '_fin

In [42]:
np.sqrt(9)

3.0

### Most common used data structure in data science:

```np.array```

Basically, a matrix, i.e. list of numbers. Can be 1D, 2D, 3D, ...

In [43]:
np.array?

[0;31mDocstring:[0m
array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0,
      like=None)

Create an array.

Parameters
----------
object : array_like
    An array, any object exposing the array interface, an object whose
    ``__array__`` method returns an array, or any (nested) sequence.
    If object is a scalar, a 0-dimensional array containing object is
    returned.
dtype : data-type, optional
    The desired data-type for the array. If not given, NumPy will try to use
    a default ``dtype`` that can represent the values (by applying promotion
    rules when necessary.)
copy : bool, optional
    If true (default), then the object is copied.  Otherwise, a copy will
    only be made if ``__array__`` returns a copy, if obj is a nested
    sequence, or if a copy is needed to satisfy any of the other
    requirements (``dtype``, ``order``, etc.).
order : {'K', 'A', 'C', 'F'}, optional
    Specify the memory layout of the array. If object is not an array, the
   

In [45]:
my_array=np.array([1, 2, 2.1, 2.2])
print(my_array)

[1.  2.  2.1 2.2]


In [47]:
my_array?

[0;31mType:[0m        ndarray
[0;31mString form:[0m [1.  2.  2.1 2.2]
[0;31mLength:[0m      4
[0;31mFile:[0m        ~/.pyenv/versions/3.11.4/envs/2521_env/lib/python3.11/site-packages/numpy/__init__.py
[0;31mDocstring:[0m  
ndarray(shape, dtype=float, buffer=None, offset=0,
        strides=None, order=None)

An array object represents a multidimensional, homogeneous array
of fixed-size items.  An associated data-type object describes the
format of each element in the array (its byte-order, how many bytes it
occupies in memory, whether it is an integer, a floating point number,
or something else, etc.)

Arrays should be constructed using `array`, `zeros` or `empty` (refer
to the See Also section below).  The parameters given here refer to
a low-level method (`ndarray(...)`) for instantiating an array.

For more information, refer to the `numpy` module and examine the
methods and attributes of an array.

Parameters
----------
(for the __new__ method; see Notes below)

shape : t

In [48]:
dir(my_array)

['T',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_finalize__',
 '__array_function__',
 '__array_interface__',
 '__array_prepare__',
 '__array_priority__',
 '__array_struct__',
 '__array_ufunc__',
 '__array_wrap__',
 '__bool__',
 '__class__',
 '__class_getitem__',
 '__complex__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__divmod__',
 '__dlpack__',
 '__dlpack_device__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__ifloordiv__',
 '__ilshift__',
 '__imatmul__',
 '__imod__',
 '__imul__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__irshift__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lshift__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__o

In [49]:
my_array.shape

(4,)

In [50]:
np.average(my_array)

1.825

In [55]:
my_matrix=np.array([[1, 2, 3],[4, 5, 6]]) #a matrix is a list of lists, each item is a row
print(my_matrix)

[[1 2 3]
 [4 5 6]]


In [54]:
my_matrix2=np.array([0, 1, 2, 3, 2, 5]).reshape(2,3)
print(my_matrix2)

[[0 1 2]
 [3 2 5]]


In [56]:
my_matrix+my_matrix2

array([[ 1,  3,  5],
       [ 7,  7, 11]])

In [58]:
my_matrix * my_matrix2  #this is hadamard product, not matrix multiplication of linear algebra

array([[ 0,  2,  6],
       [12, 10, 30]])

In [59]:
my_matrix.shape

(2, 3)

In [63]:
print(my_matrix2.T)  # T is for transpose - rows become columns and columns become rows

print('shape is', my_matrix2.T.shape)   #

[[0 3]
 [1 2]
 [2 5]]
shape is (3, 2)


### Random numbers

In [64]:
np.random?

[0;31mType:[0m        module
[0;31mString form:[0m <module 'numpy.random' from '/Users/yogeshmore/.pyenv/versions/3.11.4/envs/2521_env/lib/python3.11/site-packages/numpy/random/__init__.py'>
[0;31mFile:[0m        ~/.pyenv/versions/3.11.4/envs/2521_env/lib/python3.11/site-packages/numpy/random/__init__.py
[0;31mDocstring:[0m  
Random Number Generation

Use ``default_rng()`` to create a `Generator` and call its methods.

Generator
--------------- ---------------------------------------------------------
Generator       Class implementing all of the random number distributions
default_rng     Default constructor for ``Generator``

BitGenerator Streams that work with Generator
--------------------------------------------- ---
MT19937
PCG64
PCG64DXSM
Philox
SFC64

Getting entropy to initialize a BitGenerator
--------------------------------------------- ---
SeedSequence


Legacy
------

For backwards compatibility with previous versions of numpy before 1.17, the
various aliases to t

In [65]:
dir(np.random)

['BitGenerator',
 'Generator',
 'MT19937',
 'PCG64',
 'PCG64DXSM',
 'Philox',
 'RandomState',
 'SFC64',
 'SeedSequence',
 '__RandomState_ctor',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '_bounded_integers',
 '_common',
 '_generator',
 '_mt19937',
 '_pcg64',
 '_philox',
 '_pickle',
 '_sfc64',
 'beta',
 'binomial',
 'bit_generator',
 'bytes',
 'chisquare',
 'choice',
 'default_rng',
 'dirichlet',
 'exponential',
 'f',
 'gamma',
 'geometric',
 'get_bit_generator',
 'get_state',
 'gumbel',
 'hypergeometric',
 'laplace',
 'logistic',
 'lognormal',
 'logseries',
 'mtrand',
 'multinomial',
 'multivariate_normal',
 'negative_binomial',
 'noncentral_chisquare',
 'noncentral_f',
 'normal',
 'pareto',
 'permutation',
 'poisson',
 'power',
 'rand',
 'randint',
 'randn',
 'random',
 'random_integers',
 'random_sample',
 'ranf',
 'rayleigh',
 'sample',
 'seed',
 'set_bit_generator',
 'set_state',
 'shuf

Main methods in np.random used to generate random numbers are:
```
rand
randint
randn
random
 ```

In [66]:
np.random.rand?

[0;31mDocstring:[0m
rand(d0, d1, ..., dn)

Random values in a given shape.

.. note::
    This is a convenience function for users porting code from Matlab,
    and wraps `random_sample`. That function takes a
    tuple to specify the size of the output, which is consistent with
    other NumPy functions like `numpy.zeros` and `numpy.ones`.

Create an array of the given shape and populate it with
random samples from a uniform distribution
over ``[0, 1)``.

Parameters
----------
d0, d1, ..., dn : int, optional
    The dimensions of the returned array, must be non-negative.
    If no argument is given a single Python float is returned.

Returns
-------
out : ndarray, shape ``(d0, d1, ..., dn)``
    Random values.

See Also
--------
random

Examples
--------
>>> np.random.rand(3,2)
array([[ 0.14022471,  0.96360618],  #random
       [ 0.37601032,  0.25528411],  #random
       [ 0.49313049,  0.94909878]]) #random
[0;31mType:[0m      builtin_function_or_method

In [67]:
np.random.rand(2,3)

array([[0.8232211 , 0.75570762, 0.81965661],
       [0.40547578, 0.92177042, 0.07503891]])

In [68]:
np.random.randint?

[0;31mDocstring:[0m
randint(low, high=None, size=None, dtype=int)

Return random integers from `low` (inclusive) to `high` (exclusive).

Return random integers from the "discrete uniform" distribution of
the specified dtype in the "half-open" interval [`low`, `high`). If
`high` is None (the default), then results are from [0, `low`).

.. note::
    New code should use the `~numpy.random.Generator.integers`
    method of a `~numpy.random.Generator` instance instead;
    please see the :ref:`random-quick-start`.

Parameters
----------
low : int or array-like of ints
    Lowest (signed) integers to be drawn from the distribution (unless
    ``high=None``, in which case this parameter is one above the
    *highest* such integer).
high : int or array-like of ints, optional
    If provided, one above the largest (signed) integer to be drawn
    from the distribution (see above for behavior if ``high=None``).
    If array-like, must contain integer values
size : int or tuple of ints, option

In [70]:
np.random.randint(0,10, (2,3)) # a 2 x 3 random array of integers from 0 to 10

array([[1, 7, 0],
       [3, 4, 4]])