<h2>Interactive Python </h2>
    <br/>
The IPython shell is generally recommended for interactive work in Python (see http://ipython.org/documentation.html), but for most examples we’ll display the >>> prompt of the standard Python shell.

Normally multiline Python statements are best written in a text file rather than typing them at the prompt, but some of the short examples below are done at the prompt. If type a line that Python recognizes as an unfinished block, it will give a line starting with three dots, like:

In [3]:
if 1>2:
    print("oops!")
    print("xyz")
else:
    print("this is what we expect")
    print("abc")

this is what we expect
abc


<p>Once done with the full command, typing return alone at the ... prompt tells Python we are done and it executes the command. </p>

<h2>Indentation</h2>

<p>Most computer languages have some form of begin-end structure, or opening and closing braces, or some such thing to clearly delinieate what piece of code is in a loop, or in different parts of an if-then-else structure like what’s shown above. Good programmers generally also indent their code so it is easier for a reader to see what is inside a loop, particularly if there are multiple nested loops. But in most languages this is indentation is just a matter of style and the begin-end structure of the language determines how it is actually interpreted by the computer.
</p>
<p><strong>In Python, indentation is everything.</strong> There are no begin-end’s, only indentation. Everything that is supposed to be at one level of a loop must be indented to that level. Once the loop is done the indentation must go back out to the previous level. There are some other rules you need to learn, such as that the “else” in and if-else block like the above has to be indented exactly the same as as the “if”. See if_else for more about this.
</p>
<p>
How many spaces to indent each level is a matter of style, but you must be consistent within a single code. The standard is often 4 spaces.
</p>
<h2>Wrapping lines</h2>
<p>
In Python normally each statement is one line, and there is no need to use separators such as the semicolon used in some languages to end a line. One the other hand you can use a semicolon to put several short statements on a single line, such as:
</p>

In [4]:
x = 5; print(x)

5


It is easiest to read codes if you avoid this in most cases.

If a line of code is too long to fit on a single line, you can break it into multiple lines by putting a backslash at the end of a line:

In [5]:
y = 3 + \
...     4

In [6]:
y

7

<h2>Comments</h2>
<p>
Anything following a # in a line is ignored as a comment (unless of course the # appears in a string):
</p>

In [7]:
s = "This # is part of the string"  # this is a comment

In [8]:
s

'This # is part of the string'

<p>There is another form of comment, the docstring, discussed below following an introduction to strings.</p>

<h2>Strings</h2>
<p>Strings are specified using either single or double quotes:</p>

In [9]:
s = 'some text'
s = "some text"

<p>are the same. This is useful if you want strings that themselves contain quotes of a different type.</p>

<p>You can also use triple double quotes, which have the advantage that they such strings can span multiple lines:</p>

In [10]:
s = """Note that a ' doesn't end
... this string and that it spans two lines"""

In [11]:
s

"Note that a ' doesn't end\nthis string and that it spans two lines"

In [12]:
print(s)

Note that a ' doesn't end
this string and that it spans two lines


<p>When it prints, the carriage return at the end of the line show up as “n”. This is what is actually stored. When we “print s” it gets printed as a carriage return again.</p>

<p>You can put “n” in your strings as another way to break lines:</p>

In [13]:
print("This spans \n two lines")

This spans 
 two lines


<h2>Docstrings</h2>
<p>
Often the first thing you will see in a Python script or module, or in a function or class defined in a module, is a brief description that is enclosed in triple quotes. Although ordinarily this would just be a string, in this special position it is interpreted by Python as a comment and is not part of the code. It is called the docstring because it is part of the documentation and some Python tools automatically use the docstring in various ways. See ipython for one example. Also the documentation formatting program Sphinx that is used to create these class notes can automatically take a Python module and create html or latex documentation for it by using the docstrings, the original purpose for which Sphinx was developed. See Sphinx documentation for more about this.
</p>
<p>
It’s a good idea to get in the habit of putting a docstring at the top of every Python file and function you write.
</p>
<h2>Running Python scripts</h2>
<p>Most Python programs are written in text files ending with the .py extension. Some of these are simple scripts that are just a set of Python instructions to be executed, the same things you might type at the >>> prompt but collected in a file (which makes it much easier to modify or reuse later). Such a script can be run at the Unix command line simply by typing “python” followed by the file name.</p>

<p>See Python scripts and modules for some examples. The section Importing modules also contains important information on how to “import” modules, and how to set the path of directories that are searched for modules when you try to import a module.</p>

<h2>Python objects</h2>
<p>Python is an object-oriented language, which just means that virtually everything you encounter in Python (variables, functions, modules, etc.) is an object of some class. There are many classes of objects built into Python and in this course we will primarily be using these pre-defined classes. For large-scale programming projects you would probably define some new classes, which is easy to do. (Maybe an example to come...)</p>

<p>The type command can be used to reveal the type of an object:</p>

In [14]:
import numpy as np
print(type(np))

<class 'module'>


In [15]:
print(type(np.pi))
x = np.pi
print(type(x))

<class 'float'>
<class 'float'>


In [16]:
print(type(np.cos))

<class 'numpy.ufunc'>


<p>We see that np is a module, np.pi is a floating point real number, and np.cos is of a special class that’s defined in the numpy module.</p>

<p>The linspace command creates a numerical array that is also a special numpy class:</p>

In [19]:
x = np.linspace(0, 5, 8)

In [20]:
print(x)

[ 0.          0.71428571  1.42857143  2.14285714  2.85714286  3.57142857
  4.28571429  5.        ]


In [21]:
print(type(x))

<class 'numpy.ndarray'>


<p>Objects of a particular class generally have certain operations that are defined on them as part of the class definition. For example, NumPy numerical arrays have a max method defined, which we can use on x in one of two ways:</p>

In [22]:
np.max(x)

5.0

In [23]:
x.max()

5.0

<p>The first way applies the method max defined in the numpy module to x. The second way uses the fact that x, by virtue of being of type numpy.ndarray, automatically has a max method which can be invoked (on itself) by calling the function x.max() with no argument. Which way is better depends in part on what you’re doing.</p>

<p>Here’s another example:</p>

In [24]:
L = [0, 1, 2]

In [25]:
type(L)

list

In [26]:
L.append(4)

In [27]:
L

[0, 1, 2, 4]

<p>L is a list (a standard Python class) and so has a method append that can be used to append an item to the end of the list.</p>

<h2>Declaring variables?</h2>
<p>In many languages, such as Fortran, you must generally declare variables before you can use them and once you’ve specified that x is a real number, say, that is the only type of things you can store in x, and a statement like x = ‘string’ would not be allowed.</p>

<p>In Python you don’t declare variables, you can just type, for example:</p>

In [28]:
x = 3.4

In [29]:
2*x

6.8

In [30]:
x = 'string'

In [31]:
2*x

'stringstring'

In [32]:
x = [4, 5, 6]

In [33]:
2*x

[4, 5, 6, 4, 5, 6]

<p>Here x is first used for a real number, then for a character string, then for a list. Note, by the way, that multiplication behaves differently for objects of different type (which has been specified as part of the definition of each class of objects).</p>

<p>In Fortran if you declare x to be a real variable then it sets aside a particular 8 bytes of memory for x, enough to hold one floating point number. There’s no way to store 6 characters or a list of 3 integers in these 8 bytes.</p>

<p>In Python it is often better to think of x as simply being a pointer that points to some object. When you type “x = 3.4” Python creates an object of type float holding one real number and points x to that. When you type x = ‘string’ it creates a new object of type str and now points x to that, and so on.</p>

<h2>Lists</h2>
<p>We have already seen lists in the example above.</p>

<p>Note that indexing in Python always starts at 0:</p>

In [41]:
L = [4,5,6]

In [45]:
L[0:]

[4, 5, 6]

In [36]:
L[1]

5

<p>Elements of a list need not all have the same type. For example, here’s a list with 5 elements:</p>

In [46]:
L = [5, 2.3, 'abc', [4,'b'], np.cos]
for i in L:
    print(i)

5
2.3
abc
[4, 'b']
<ufunc 'cos'>


Here’s a way to see what each element of the list is, and its type:

In [47]:
for index,value in enumerate(L):
    print('L[%s] is %16s     %s' % (index,value,type(value)))
    print(index, value)

L[0] is                5     <class 'int'>
0 5
L[1] is              2.3     <class 'float'>
1 2.3
L[2] is              abc     <class 'str'>
2 abc
L[3] is         [4, 'b']     <class 'list'>
3 [4, 'b']
L[4] is    <ufunc 'cos'>     <class 'numpy.ufunc'>
4 <ufunc 'cos'>


<p>Note that L[3] is itself a list containing an integer and a string and that L[4] is a function.</p>

<p>One nice feature of Python is that you can also index backwards from the end: since L[0] is the first item, L[-1] is what you get going one to the left of this, and wrapping around (periodic boundary conditions in math terms):</p>

In [48]:
for index in [-1, -2, -3, -4, -5]:
    print('L[%s] is %16s' % (index, L[index]))

L[-1] is    <ufunc 'cos'>
L[-2] is         [4, 'b']
L[-3] is              abc
L[-4] is              2.3
L[-5] is                5


<p>In particular, L[-1] always refers to the last item in list L.</p>

<h2>Copying objects</h2>
<p>One implication of the fact that variables are just pointers to objects is that two names can point to the same object, which can sometimes cause confusion. Consider this example:</p>

In [42]:
x = [4,5,6]

In [43]:
y = x

In [44]:
y

[4, 5, 6]

In [45]:
y.append(9)

In [47]:
y

[4, 5, 6, 9]

<p>So far nothing too surprising. We initialized y to be x and then we appended another list element to y. But take a look at x:</p>

In [48]:
x

[4, 5, 6, 9]

<p>We didn’t really append 9 to y, we appended it to the object y points to, which is the same object x points to!</p>

<p>Failing to pay attention to this sort of thing can lead to programming nightmares.</p>

<p>What if we really want y to be a different object that happens to be initialized by copying x? We can do this by:</p>

In [49]:
x = [4,5,6]

In [50]:
y = list(x)

In [51]:
y

[4, 5, 6]

In [52]:
y.append(9)

In [53]:
y

[4, 5, 6, 9]

In [54]:
x

[4, 5, 6]

<p>This is what we want. Here list(x) creates a new object, that is a list, using the elements of the list x to initialize it, and y points to this new object. Changing this object doesn’t change the one x pointed to.</p>

<p>You could also use the copy module, which works in general for any objects:</p>

In [55]:
import copy
y = copy.copy(x)

<p>Sometimes it is more complicated, if the list x itself contains other objects. See http://docs.python.org/library/copy.html for more information.</p>

<p>There are some objects that cannot be changed once created (immutable objects, as described further below). In particular, for floats and integers, you can do things like:</p>

In [56]:
x = 3.4
y = x
y = y+1
y

4.4

In [57]:
x

3.4

<p>Here changing y did not change x, luckily. We don’t have to explicitly make a copy of x for y in this case. If we did, writing any sort of numerical code in Python would be a nightmare.</p>

<p>We didn’t because the command:</p>

In [58]:
y = y+1

<p>above is not changing the object y points to, instead it is creating a new object that y now points to, while x still points to the old object.</p>

<p>For more about built-in data types in Python, see http://docs.python.org/release/2.5.2/ref/types.html.</p>

<h2>Mutable and Immutable objects</h2>
<p>Some objects can be changed after they have been created and others cannot be. Understanding the difference is key to understanding why the examples above concerning copying objects behave as they do.</p>

<p>A list is a mutable object. The statement:</p>

In [59]:
x = [4,5,6]

<p>above created an object that x points to, and the data held in this object can be changed without having to create a new object. The statement</p>

<p>$ y = x$</p>
<p>points y at the same object, and since it can be changed, any change will affect the object itself and be seen whether we access it using the pointer x or y.</p>

<p>We can check this by:</p>

In [60]:
id(x)

140418456523464

In [61]:
id(y)

140417799512208

<p>The id function just returns the location in memory where the object is stored. If you do something like x[0] = 1, you will find that the objects’ id’s have not changed, they both point to the same object, but the data stored in the object has changed.</p>

<p>Some data types correspond to immutable objects that, once created, cannot be changed. Integers, floats, and strings are immutable:</p>

In [62]:
s = "This is a string"

In [66]:
s[0]

'T'

In [64]:
s[0] = 'b'

TypeError: 'str' object does not support item assignment

In [67]:
id(s)

140417799495088

In [68]:
s = "New string"

In [69]:
id(s)

140417799516592

<p>What happened to the old object? It depends on whether any other variable was pointing to it. If not, as in the example above, then Python’s garbage collection would recognize it’s no longer needed and free up the memory for other uses. But if any other variable is still pointing to it, the object will still exist, e.g.</p>

In [70]:
s2=s

In [72]:
id(s2)  # same object as s above

140417799516592

In [73]:
s = "Yet another string"   # creates a new object

In [74]:
id(s)                       # s now points to new object

140417799494512

In [76]:
id(s2)                      # s2 still points to the old one

140417799516592

<h2>Tuples</h2>
<p>We have seen that lists are mutable. For some purposes we need something like a list but that is immuatable (e.g. for dictionary keys, see below). A tuple is like a list but defined with parentheses (..) rather than square brackets [..]:</p>



In [77]:
t = (4,5,6)

In [78]:
t[0]

4

In [79]:
t[0] = 9

TypeError: 'tuple' object does not support item assignment

<h2>Iterators</h2>
<p>We often want to iterate over a set of things. In Python there are many ways to do this, and it often takes the form:</p>

In [None]:
for A in B:
    # do something, probably involving the current A

<p>In this construct B is any Python object that is iterable, meaning it has a built-in way (when B’s class was defined) of starting with one thing in B and progressing through the contents of B in some hopefully logical order.</p>

<p>Lists and tuples are iterable in the obvious way: we step through it one element at a time starting at the beginning:</p>

In [83]:
for i in [3, 7, 'b']:
    print("i is now ", i)

i is now  3
i is now  7
i is now  b


<h2>range</h2>
<p>In numerical work we often want to have i start at 0 and go up to some number N, stepping by one. We obviously don’t want to have to construct the list [0, 1, 2, 3, ..., N] by typing all the numbers when N is large, so Python has a way of doing this:</p>

In [87]:
range(7)

range(0, 7)

NOTE: The last element is 6, not 7. The list has 7 elements but starts by default at 0, just as Python indexing does. This makes it convenient for doing things like:

In [88]:
L = ['a', 8, 12]

In [89]:
for i in range(len(L)):
    print("i = ", i, "  L[i] = ", L[i])

i =  0   L[i] =  a
i =  1   L[i] =  8
i =  2   L[i] =  12


Note that len(L) returns the length of the list, so range(len(L)) is always a list of all the valid indices for the list L.

<h2>enumerate</h2>
<p>Another way to do this is:</p>

In [91]:
for i,value in enumerate(L):
    print("i = ",i, "  L[i] = ",value)

i =  0   L[i] =  a
i =  1   L[i] =  8
i =  2   L[i] =  12


range can be used with more arguments, for example if you want to start at 2 and step by 3 up to 20:

In [94]:
range(2,20,3)

range(2, 20, 3)

<p>Note that this doesn’t go up to 20. Just like range(7) stops at 6, this list stops one item short of what you might expect.</p>

<p>NumPy has a linspace command that behaves like Matlab’s, which is sometimes more useful in numerical work, e.g.:</p>

In [95]:
np.linspace(2,20,7)

array([ 2.,  5.,  8., 11., 14., 17., 20.])

<p>This returns a NumPy array with 7 equally spaced points between 2 and 20, including the endpoints. Note that the elements are floats, not integers. You could use this as an iterator too.</p>

<p>If you plan to iterate over a lot of values, say 1 million, it may be inefficient to generate a list object with 1 million elements using range. So there is another option called xrange, that does the iteration you want without explicitly creating and storing the list:</p>

In [59]:
import os
print(os.getcwd())

/home/manmeet/Documents/teri


<h2>Python scripts and modules</h2>
<p>A Python script is a collection of commands in a file designed to be executed like a program. The file can of course contain functions and import various modules, but the idea is that it will be run or executed from the command line or from within a Python interactive shell to perform a specific task. Often a script first contains a set of function definitions and then has the main program that might call the functions.</p>

<h5>script1.py</h5>

In [96]:
"""
Sample script to print values of a function at a few points.
"""
import numpy as np

def f(x):
    """
    A quadratic function.
    """
    y = x**2 + 1.
    return y

print("     x        f(x)")
for x in np.linspace(0,4,3):
    print("%8.3f  %8.3f" % (x, f(x)))

     x        f(x)
   0.000     1.000
   2.000     5.000
   4.000    17.000


The main program starts with the print statement.

There are several ways to run a script contained in a file.

At the Unix prompt:

From within Python:

In [60]:
exec(open("script1.py").read())

     x        f(x)
   0.000     1.000
   2.000     5.000
   4.000    17.000


From within IPython, using either execfile as above, or run:

In [102]:
run script1.py

     x        f(x)
   0.000     1.000
   2.000     5.000
   4.000    17.000


Or, you can import the file as a module (see Importing modules below for more about this):

In [103]:
import script1

     x        f(x)
   0.000     1.000
   2.000     5.000
   4.000    17.000


Note that this also gives the same output. Whenever a module is imported, any statements that are in the main body of the module are executed when it is imported. In addition, any variables or functions defined in the file are available as attributes of the module, e.g.,

In [104]:
script1.f(4)

17.0

In [105]:
script1.np

<module 'numpy' from '/home/manmeet/anaconda3/envs/py35/lib/python3.6/site-packages/numpy/__init__.py'>

Note there are some differences between executing the script and importing it. When it is executed as a script, it is as if the commands were typed at the command line. Hence:

In [106]:
exec(open("script1.py").read())

     x        f(x)
   0.000     1.000
   2.000     5.000
   4.000    17.000


In [107]:
f

<function __main__.f(x)>

In [108]:
np

<module 'numpy' from '/home/manmeet/anaconda3/envs/py35/lib/python3.6/site-packages/numpy/__init__.py'>

<p>In this case f and np are in the namespace of the interactive session as if we had defined them at the prompt.</p>

<h2>Writing scripts for ease of importing</h2>
<p>The script used above as an example contains a function f(x) that we might want to be able to import without necessarily running the main program. This can be arranged by modifying the script as follows:</p>

<h5>script2.py</h5>

In [109]:
"""
Sample script to print values of a function at a few points.
The printing is only done if the file is executed as a script, not if it is
imported as a module.
"""
import numpy as np

def f(x):
    """
    A quadratic function.
    """
    y = x**2 + 1.
    return y

def print_table():
    print("     x        f(x)")
    for x in np.linspace(0,4,3):
        print("%8.3f  %8.3f" % (x, f(x)))

if __name__ == "__main__":
    print_table()

     x        f(x)
   0.000     1.000
   2.000     5.000
   4.000    17.000


When a file is imported or executed, an attribute __name__ is automatically set, and has the value __main__ only if the file is executed as a script, not if it is imported as a module. So we see the following behavior:

as with script1.py, but:

In [110]:
import script2           # does not print table

In [112]:
script2.__name__         # not '__main__'

'script2'

In [113]:
script2.f(4)

17.0

In [114]:
script2.print_table()

     x        f(x)
   0.000     1.000
   2.000     5.000
   4.000    17.000


<h2>Reloading modules</h2>
<p>When you import a module, Python keeps track of the fact that it is imported and if it encounters another statement to import the same module will not bother to do so again (the list of modules already import is in sys.modules). This is convenient since loading a module can be time consuming. So if you’re debugging a script using execfile or run from an IPython shell, each time you change it and then re-execute it will not reload numpy, for example.</p>

<p>Sometimes, however, you want to force reloading of a module, in particular if it has changed (e.g. when we are debugging it).</p>

<p>Suppose, for example, that we modify script2.py so that the quadratic function is changed from $y = x^{2} + 1$  to $y = x^{2} + 10$. If we make this change and then try the following (in the same Python session as above, where script2 was already imported as a module):</p>

In [115]:
import script2

In [116]:
script2.print_table()

     x        f(x)
   0.000     1.000
   2.000     5.000
   4.000    17.000


we get the same results as above, even though we changed script2.py.<br>

We have to use the reload command to see the change we want:

In [118]:
from importlib import reload
reload(script2)

<module 'script2' from '/home/manmeet/Documents/teri/script2.py'>

In [119]:
script2.print_table()

     x        f(x)
   0.000    10.000
   2.000    14.000
   4.000    26.000


<h2>Command line arguments</h2>
<p>We might want to make this script a bit fancier by adding an optional argument to the print_table function to print a different number of points, rather than the 3 points shown above.</p>

<p>The next version has this change, and also has a modified version of the main program that allows the user to specify this value n as a command line argument:</p>

<h5>script3.py</h5>

In [139]:
"""

Modification of script2.py that allows a command line argument telling how
many points to plot in the table.

Usage example: To print table with 5 values:
   python script3 5

"""
import numpy as np

def f(x):
    """
    A quadratic function.
    """
    y = x**2 + 1.
    return y

def print_table(n=2):
    print("     x        f(x)")
    for x in np.linspace(0,4,n):
        print("%8.3f  %8.3f" % (x, f(x)))

if __name__ == "__main__":
    """
    What to do if the script is executed at command line.
    Note that sys.argv is a list of the tokens typed at the command line.
    """
    import sys
    print("sys.argv is ",sys.argv)
    if len(sys.argv) > 1:
        try:
            n = int(sys.argv[1])
            print_table(n)
        except:
            print("*** Error: expect an integer n as the argument")
    else:
        print_table()

sys.argv is  ['/home/manmeet/anaconda3/envs/py35/lib/python3.6/site-packages/ipykernel_launcher.py', '-f', '/home/manmeet/.local/share/jupyter/runtime/kernel-a6904343-3430-4b67-a126-523d7a5b7e3c.json']
*** Error: expect an integer n as the argument


Note that:


* The function sys.argv from the sys module returns the arguments that were present if the script is executed from the command line. It is a list of strings, with sys.argv[0] being the name of the script itself, sys.argv[1] being the next thing on the line, etc. (if there were more than one command line argument, separated by spaces).
* We use int(sys.argv[1]) to convert the argument, if present, from a string to an integer.
* We put this conversion in a try-except block in case the user gives an invalid argument.
<br>
<p>Sample output:</p>

<h2>Importing modules</h2>
<p>When Python starts up there are a certain number of basic commands defined along with the general syntax of the language, but most useful things needed for specific purposes (such as working with webpages, or solving linear systems) are in modules that do not load by default. Otherwise it would take forever to start up Python, loading lots of things you don’t plan to use. So when you start using Python, either interactively or at the top of a script, often the first thing you do is import one or more modules.</p>

<p>A Python module is often defined simply by grouping a set of parameters and functions together in a single .py file. See Python scripts and modules for some examples.</p>

<p>Two useful modules are os and sys that help you interact with the operating system and the Python system that is running. These are standard modules that should be available with any Python implementation, so you should be able to import them at the Python prompt:</p>

In [122]:
import os, sys

Each module contains many different functions and parameters which are the methods and attributes of the module. Here we will only use a couple of these. The getcwd method of the os module is called to return the “current working directory” (the same thing pwd prints in Unix), e.g.:

In [123]:
os.getcwd()

'/home/manmeet/Documents/teri'

Note that this function is called with no arguments, but you need the open and close parens. If you type “os.getcwd” without these, Python will instead print what type of object this function is:

In [124]:
os.getcwd

<function posix.getcwd()>

<h2>The Python Path</h2>
<p>The sys module has an attribute sys.path, a variable that is set by default to the search path for modules. Whenever you perform an import, this is the set of directories that Python searches through looking for a file by that name (with a .py extension). If you print this, you will see a list of strings, each one of which is the full path to some directory. Sometimes the first thing in this list is the empty string, which means “the current directory”, so it looks for a module in your working directory first and if it doesn’t find it, searches through the other directories in order:</p>

In [125]:
print(sys.path)

['/home/manmeet/anaconda3/envs/py35/lib/python36.zip', '/home/manmeet/anaconda3/envs/py35/lib/python3.6', '/home/manmeet/anaconda3/envs/py35/lib/python3.6/lib-dynload', '', '/home/manmeet/anaconda3/envs/py35/lib/python3.6/site-packages', '/home/manmeet/anaconda3/envs/py35/lib/python3.6/site-packages/IPython/extensions', '/home/manmeet/.ipython']


If you try to import a module and it doesn’t find a file with this name on the path, then you will get an import error:

In [126]:
import junkname

ModuleNotFoundError: No module named 'junkname'

When new Python software such as NumPy or SciPy is installed, the installation script should modify the path appropriately so it can be found. You can also add to the path if you have your own directory that you want Python to look in, e.g.:

will append the directory indicated to the path. To avoid having to do this each time you start Python, you can set a Unix environment variable that is used to modify the path every time Python is started. First print out the current value of this variable:

It will probably be blank unless you’ve set this before or have installed software that sets this automatically. To append the above example directory to this path:

This appends another directory to the search path already specified (if any). You can repeat this multiple times to add more directories, or put something like:

in your .bashrc file if there are the only 3 personal directories you always want to search.

<h2>Other forms of import</h2>
<p>If all we want to use from the os module is getcwd, then another option is to do:</p>

In [127]:
from os import getcwd

In [128]:
getcwd()

'/home/manmeet/Documents/teri'

<p>In this case we only imported one method from the module, not the whole thing. Note that now getcwd is called by just giving the name of the method, not module.method. The name getcwd is now in our namespace. If we only imported getcwd and tried typing “os.getcwd()” we’d get an error, since it wouldn’t find os in our namespace.</p>

<p>You can rename things when you import them, which is sometimes useful if different modules contain different objects with the same name. For example, to compare how the sqrt function in the standard Python math module compares to the numpy version:</p>

In [129]:
from math import sqrt as sqrtm
from numpy import sqrt as sqrtn

In [130]:
sqrtm(-1.)

ValueError: math domain error

In [131]:
sqrtn(-1.)

  """Entry point for launching an IPython kernel.


nan

<p>The standard function gives an error whereas the numpy version returns nan, a special numpy object representing “Not a Number”.</p>

<p>You can also import a module and give it a different name locally. This is particularly useful if you import a module with a long name, but even for numpy many examples you’ll find on the web abbreviate this as np (see Numerics in Python):</p>

In [132]:
import numpy as np
theta = np.linspace(0., 2*np.pi, 5)

In [133]:
theta

array([0.        , 1.57079633, 3.14159265, 4.71238898, 6.28318531])

In [134]:
np.cos(theta)

array([ 1.0000000e+00,  6.1232340e-17, -1.0000000e+00, -1.8369702e-16,
        1.0000000e+00])

If you don’t like having to type the module name repeatedly you can import just the things you need into your namespace:

In [135]:
from numpy import pi, linspace, cos
theta = linspace(0., 2*pi, 5)

In [136]:
theta

array([0.        , 1.57079633, 3.14159265, 4.71238898, 6.28318531])

In [137]:
cos(theta)

array([ 1.0000000e+00,  6.1232340e-17, -1.0000000e+00, -1.8369702e-16,
        1.0000000e+00])

If you’re going to be using lots of things form numpy you might want to import everything into your namespace:

In [138]:
from numpy import *

Then linspace, pi, cos, and several hundred other things will be available without the prefix.

When writing code it is often best to not do this, however, since then it is not clear to the reader (or even to the programmer sometimes) what methods or attributes are coming from which module if several different modules are being used. (They may define methods with the same names but that do very different things, for example.)

In [144]:
np.linspace(0,4,3)

array([0., 2., 4.])

<h2>Python functions</h2>
<p>Functions are easily defined in Python using def, for example:</p>

In [146]:
def myfcn(x):
    import numpy as np
    y = np.cos(x) * np.exp(x)
    return y

In [147]:
myfcn(0.)

1.0

In [148]:
myfcn(1.)

1.4686939399158851

<p>As elsewhere in Python, there is no begin-end notation except the indentation. If you are defining a function at the command line as above, you need to input a blank line to indicate that you are done typing in the function.</p>

<h2>Defining functions in modules</h2>
<p>Except for very simple functions, you do not want to type it in at the command line in Python. Normally you want to create a text file containing your function and import the resulting module into your interactive session.</p>

<p>If you have a file named myfile.py for example that contains:</p>

and this file is in your Python search path (see python_path), then you can do:

In [150]:
from myfile import myfcn

In [151]:
myfcn(0.)

1.0

In [152]:
myfcn(1.)

1.4686939399158851

In Python a function is an object that can be manipulated like any other object.

<h2>Lambda functions</h2>
<p>Some functions can be easily defined in a single line of code, and it is sometimes useful to be able to define a function “on the fly” using “lambda” notation. To define a function that returns 2*x for any input x, rather than:</p>

In [153]:
def f(x):
    return 2*x

we could also define f via:

In [154]:
f = lambda x: 2*x

You can also define functions of more than one variable, e.g.:

In [155]:
g = lambda x,y: 2*(x+y)

<h2>Python strings</h2>

<h4>String formatting</h4>

<p>Often you want to construct a string that incorporates the values of some variables. This can be done using the form format % values where format is a string that describes the desired format and values is a single value or tuple of values that go into various slots in the format.</p>

<p>This is best learned from some examples:</p>

In [156]:
x = 45.6

In [157]:
s = "The value of x is %s"  % x

In [158]:
s

'The value of x is 45.6'

The %s in the format string means to convert x to a string and insert into the format. It will use as few spaces as possible.

In [159]:
s = "The value of x is %21.14e"  % x

In [160]:
s

'The value of x is  4.56000000000000e+01'

In the case above, exponential notation is used with 14 digits to the right of the decimal point, put into a field of 21 digits total. (You need at least 7 extra characters to leave room for a possible minus sign as well as the first digit, the decimal point, and the exponent such as e+01.

In [161]:
y = -0.324876

In [162]:
s = "Now x is %8.3f and y is %8.3f" % (x,y)

In [163]:
s

'Now x is   45.600 and y is   -0.325'

<p>In this example, fixed notation is used instead of scientific notation, with 3 digits to the right of the decimal point, in a field 8 characters wide. Note that y has been rounded.</p>

<p>In the last example, two variables are inserted into the format string.</p>

<h2>Jupyter Notebook</h2>

The Jupyter notebook is fairly new and changing rapidly. Install the notebook by 

Then start the notebook via:

In [164]:
import pandas as pd
from pylab import *
df = pd.DataFrame(rand(10,5), columns=["A", "B", "C", "D", "E"])

In [169]:
df.ix[:, df.ix[0, :]<0.5]

Unnamed: 0,A,B,C,E
0,0.159699,0.123944,0.483297,0.426953
1,0.083522,0.004395,0.459733,0.604684
2,0.409187,0.050444,0.014837,0.544117
3,0.760465,0.759616,0.062657,0.413452
4,0.851981,0.69613,0.061952,0.562658
5,0.278712,0.249822,0.231442,0.67232
6,0.713089,0.562738,0.58563,0.897577
7,0.204244,0.539397,0.03119,0.70255
8,0.929349,0.096289,0.755836,0.047888
9,0.266317,0.79871,0.603293,0.911597


Constructing a DataFrame from a dictionary where the keys become the
column names

In [170]:
import pandas as pd
import string

In [175]:
spam_corpus = map(str.split, [ "buy viagra", "buy antibody" ])
print(spam_corpus)

<map object at 0x7fb58726e668>


In [176]:
unique_words = set([ word for doc in spam_corpus for word in doc ])
print(unique_words)

{'antibody', 'buy', 'viagra'}


In [177]:
word_counts = [ (word, map(lambda doc: doc.count(word), spam_corpus)) for word in unique_words ]
print(word_counts)

[('antibody', <map object at 0x7fb58726e860>), ('buy', <map object at 0x7fb58726e780>), ('viagra', <map object at 0x7fb58726ee80>)]


In [178]:
spam_bag_of_words = pd.DataFrame(dict(word_counts))
print(spam_bag_of_words)

Empty DataFrame
Columns: [antibody, buy, viagra]
Index: []


In [180]:
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
url = "http://vincentarelbundock.github.io/Rdatasets/csv/HistData/Guerry.csv"
dat = pd.read_csv(url)
results = smf.ols("Lottery ~ Literacy + np.log(Pop1831)", data=dat).fit()
results.summary()

0,1,2,3
Dep. Variable:,Lottery,R-squared:,0.348
Model:,OLS,Adj. R-squared:,0.333
Method:,Least Squares,F-statistic:,22.2
Date:,"Tue, 15 Oct 2019",Prob (F-statistic):,1.9e-08
Time:,02:09:06,Log-Likelihood:,-379.82
No. Observations:,86,AIC:,765.6
Df Residuals:,83,BIC:,773.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,246.4341,35.233,6.995,0.000,176.358,316.510
Literacy,-0.4889,0.128,-3.832,0.000,-0.743,-0.235
np.log(Pop1831),-31.3114,5.977,-5.239,0.000,-43.199,-19.424

0,1,2,3
Omnibus:,3.713,Durbin-Watson:,2.019
Prob(Omnibus):,0.156,Jarque-Bera (JB):,3.394
Skew:,-0.487,Prob(JB):,0.183
Kurtosis:,3.003,Cond. No.,702.0


In [181]:
dat

Unnamed: 0.1,Unnamed: 0,dept,Region,Department,Crime_pers,Crime_prop,Literacy,Donations,Infants,Suicides,...,Crime_parents,Infanticide,Donation_clergy,Lottery,Desertion,Instruction,Prostitutes,Distance,Area,Pop1831
0,1,1,E,Ain,28870,15890,37,5098,33120,35039,...,71,60,69,41,55,46,13,218.372,5762,346.03
1,2,2,N,Aisne,26226,5521,51,8901,14572,12831,...,4,82,36,38,82,24,327,65.945,7369,513.00
2,3,3,C,Allier,26747,7925,13,10973,17044,114121,...,46,42,76,66,16,85,34,161.927,7340,298.26
3,4,4,E,Basses-Alpes,12935,7289,46,2733,23018,14238,...,70,12,37,80,32,29,2,351.399,6925,155.90
4,5,5,E,Hautes-Alpes,17488,8174,69,6962,23076,16171,...,22,23,64,79,35,7,1,320.280,5549,129.10
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
81,82,86,W,Vienne,15010,4710,25,8922,35224,21851,...,20,1,44,40,38,65,18,170.523,6990,282.73
82,83,87,C,Haute-Vienne,16256,6402,13,13817,19940,33497,...,68,6,78,55,11,84,7,198.874,5520,285.13
83,84,88,E,Vosges,18835,9044,62,4040,14978,33029,...,58,34,5,14,85,11,43,174.477,5874,397.99
84,85,89,C,Yonne,18006,6516,47,4276,16616,12789,...,32,22,35,51,66,27,272,81.797,7427,352.49


In [182]:
# intialise data of lists. 
data = {'Name':['Tom', 'nick', 'krish', 'jack'], 'Age':[20, 21, 19, 18]} 
  
# Create DataFrame 
df = pd.DataFrame(data) 
  
# Print the output. 
df 

Unnamed: 0,Name,Age
0,Tom,20
1,nick,21
2,krish,19
3,jack,18


In [183]:
# importing pandas as pd 
import pandas as pd 
  
# importing numpy as np 
import numpy as np 
  
# dictionary of lists 
dict = {'First Score':[100, 90, np.nan, 95], 
        'Second Score': [30, 45, 56, np.nan], 
        'Third Score':[np.nan, 40, 80, 98]} 
  
# creating a dataframe from dictionary 
df = pd.DataFrame(dict) 
  
# filling missing value using fillna()   
df.fillna(0) 

Unnamed: 0,First Score,Second Score,Third Score
0,100.0,30.0,0.0
1,90.0,45.0,40.0
2,0.0,56.0,80.0
3,95.0,0.0,98.0


In [187]:
df1 = pd.DataFrame([['a', 1], ['b', 2]], columns=['letter', 'number'])
df4 = pd.DataFrame([['bird', 'polly'], ['monkey', 'george']], columns=['animal', 'name'])
pd.concat([df1, df4], axis=1)

Unnamed: 0,letter,number,animal,name
0,a,1,bird,polly
1,b,2,monkey,george


<h2>Data Types </h2>

In [2]:
x = 4
print(x)       # Prints "3"
print(type(x)) # Prints "<class 'int'>"
print(x + 1)   # Addition; prints "4"
print(x - 1)   # Subtraction; prints "2"
print(x * 2)   # Multiplication; prints "6"
print(x ** 2)  # Exponentiation; prints "9"
x += 1
print(x)  # Prints "5"
x *= 2
print(x)  # Prints "10"
y = 1.5
print(type(y)) # Prints "<class 'float'>"
print(y, y + 1, y * 2, y ** 2) # Prints "1.5 2.5 3.0 2.25"

4
<class 'int'>
5
3
8
16
5
10
<class 'float'>
1.5 2.5 3.0 2.25


In [4]:
def quicksort(array):
    if len(array) <= 1:
        return array
    pivot = array[len(array) // 2]
    left = [x for x in array if x < pivot]
    middle = [x for x in array if x == pivot]
    right = [x for x in array if x > pivot]
    return quicksort(left) + middle + quicksort(right)

print(quicksort([4,5,3,1,2,10,9]))
# Prints "[1, 2, 3, 4, 5, 9, 10]"

[1, 2, 3, 4, 5, 9, 10]


<h2>Logical Expressions</h2>

In [5]:
t = True
f = False
print(type(t)) # Prints "<class 'bool'>"
print(t and f) # Logical AND; prints "False"
print(t or f)  # Logical OR; prints "True"
print(not t)   # Logical NOT; prints "False"
print(t != f)  # Logical XOR; prints "True"

<class 'bool'>
False
True
False
True


<h2>Strings</h2>Python has great support for strings:

In [3]:
hello = 'hello'    # String literals can use single quotes
world = "world"    # or double quotes; it does not matter.
print(hello)       # Prints "hello"
print(len(hello))  # String length; prints "5"
hw = hello + ' ' + world  # String concatenation
print(hw)  # prints "hello world"
hw12 = '%s %s %d' % (hello, world, 12)  # sprintf style string formatting
print(hw12)  # prints "hello world 12"

hello
5
hello world
hello world 12


String objects have a bunch of useful methods; for example:

In [4]:
s = "hello"
print(s.capitalize())  # Capitalize a string; prints "Hello"
print(s.upper())       # Convert a string to uppercase; prints "HELLO"
print(s.rjust(7))      # Right-justify a string, padding with spaces; prints "  hello"
print(s.center(7))     # Center a string, padding with spaces; prints " hello "
print(s.replace('l', '(ell)'))  # Replace all instances of one substring with another;
                                # prints "he(ell)(ell)o"
print('  world '.strip())  # Strip leading and trailing whitespace; prints "world"

Hello
HELLO
  hello
 hello 
he(ell)(ell)o
world


<h2>Containers</h2>
Python includes several built-in container types: lists, dictionaries, sets, and tuples.


<h2>Lists</h2>
A list is the Python equivalent of an array, but is resizeable and can contain elements of different types:

In [5]:
xs = [3, 1, 2]    # Create a list
print(xs, xs[2])  # Prints "[3, 1, 2] 2"
print(xs[-1])     # Negative indices count from the end of the list; prints "2"
xs[2] = 'foo'     # Lists can contain elements of different types
print(xs)         # Prints "[3, 1, 'foo']"
xs.append('bar')  # Add a new element to the end of the list
print(xs)         # Prints "[3, 1, 'foo', 'bar']"
x = xs.pop()      # Remove and return the last element of the list
print(x, xs)      # Prints "bar [3, 1, 'foo']"


[3, 1, 2] 2
2
[3, 1, 'foo']
[3, 1, 'foo', 'bar']
bar [3, 1, 'foo']


<h2>Slicing</h2>In addition to accessing list elements one at a time, Python provides concise syntax to access sublists; this is known as slicing:

In [6]:
nums = list(range(5))     # range is a built-in function that creates a list of integers
print(nums)               # Prints "[0, 1, 2, 3, 4]"
print(nums[2:4])          # Get a slice from index 2 to 4 (exclusive); prints "[2, 3]"
print(nums[2:])           # Get a slice from index 2 to the end; prints "[2, 3, 4]"
print(nums[:2])           # Get a slice from the start to index 2 (exclusive); prints "[0, 1]"
print(nums[:])            # Get a slice of the whole list; prints "[0, 1, 2, 3, 4]"
print(nums[:-1])          # Slice indices can be negative; prints "[0, 1, 2, 3]"
nums[2:4] = [8, 9]        # Assign a new sublist to a slice
print(nums)               # Prints "[0, 1, 8, 9, 4]"

[0, 1, 2, 3, 4]
[2, 3]
[2, 3, 4]
[0, 1]
[0, 1, 2, 3, 4]
[0, 1, 2, 3]
[0, 1, 8, 9, 4]


<h2>Loops</h2> You can loop over the elements of a list like this:

In [8]:
animals = ['cat', 'dog', 'monkey']
for animal in animals:
    print(animal)
# Prints "cat", "dog", "monkey", each on its own line.

cat
dog
monkey


<h2>List comprehensions</h2> When programming, frequently we want to transform one type of data into another. As a simple example, consider the following code that computes square numbers:

In [9]:
nums = [0, 1, 2, 3, 4]
squares = []
for x in nums:
    squares.append(x ** 2)
print(squares)   # Prints [0, 1, 4, 9, 16]

[0, 1, 4, 9, 16]


You can make this code simpler using a list comprehension:

In [10]:
nums = [0, 1, 2, 3, 4]
squares = [x ** 2 for x in nums]
print(squares)   # Prints [0, 1, 4, 9, 16]

[0, 1, 4, 9, 16]


List comprehensions can also contain conditions:

In [11]:
nums = [0, 1, 2, 3, 4]
even_squares = [x ** 2 for x in nums if x % 2 == 0]
print(even_squares)  # Prints "[0, 4, 16]"

[0, 4, 16]


<h2>Dictionaries</h2>A dictionary stores (key, value) pairs

In [1]:
d = {'cat': 'cute', 'dog': 'furry'}  # Create a new dictionary with some data
print(d['cat'])       # Get an entry from a dictionary; prints "cute"
print('cat' in d)     # Check if a dictionary has a given key; prints "True"
d['fish'] = 'wet'     # Set an entry in a dictionary
print(d['fish'])      # Prints "wet"
# print(d['monkey'])  # KeyError: 'monkey' not a key of d
print(d.get('monkey', 'N/A'))  # Get an element with a default; prints "N/A"
print(d.get('fish', 'N/A'))    # Get an element with a default; prints "wet"
del d['fish']         # Remove an element from a dictionary
print(d.get('fish', 'N/A')) # "fish" is no longer a key; prints "N/A"

cute
True
wet
N/A
wet
N/A


Loops: It is easy to iterate over the keys in a dictionary:

In [2]:
d = {'person': 2, 'cat': 4, 'spider': 8}
for animal in d:
    legs = d[animal]
    print('A %s has %d legs' % (animal, legs))

A person has 2 legs
A cat has 4 legs
A spider has 8 legs


If you want access to keys and their corresponding values, use the items method:

In [12]:
d = {'person': 2, 'cat': 4, 'spider': 8}
for animal, legs in d.items():
    print('A %s has %d legs' % (animal, legs))
# Prints "A person has 2 legs", "A cat has 4 legs", "A spider has 8 legs"

A person has 2 legs
A cat has 4 legs
A spider has 8 legs


<h2>Dictionary comprehensions</h2> These are similar to list comprehensions, but allow you to easily construct dictionaries. For example:

In [13]:
nums = [0, 1, 2, 3, 4]
even_num_to_square = {x: x ** 2 for x in nums if x % 2 == 0}
print(even_num_to_square)  # Prints "{0: 0, 2: 4, 4: 16}"

{0: 0, 2: 4, 4: 16}


<h2>Sets</h2>
A set is an unordered collection of distinct elements. As a simple example, consider the following:

In [14]:
animals = {'cat', 'dog'}
print('cat' in animals)   # Check if an element is in a set; prints "True"
print('fish' in animals)  # prints "False"
animals.add('fish')       # Add an element to a set
print('fish' in animals)  # Prints "True"
print(len(animals))       # Number of elements in a set; prints "3"
animals.add('cat')        # Adding an element that is already in the set does nothing
print(len(animals))       # Prints "3"
animals.remove('cat')     # Remove an element from a set
print(len(animals))       # Prints "2"

True
False
True
3
3
2


Loops: Iterating over a set has the same syntax as iterating over a list; however since sets are unordered, you cannot make assumptions about the order in which you visit the elements of the set:

In [15]:
animals = {'cat', 'dog', 'fish'}
for idx, animal in enumerate(animals):
    print('#%d: %s' % (idx + 1, animal))
# Prints "#1: fish", "#2: dog", "#3: cat"

#1: dog
#2: cat
#3: fish


Set comprehensions: Like lists and dictionaries, we can easily construct sets using set comprehensions:

In [16]:
from math import sqrt
nums = {int(sqrt(x)) for x in range(30)}
print(nums)  # Prints "{0, 1, 2, 3, 4, 5}"

{0, 1, 2, 3, 4, 5}


<h2>Tuples</h2>
A tuple is an (immutable) ordered list of values. A tuple is in many ways similar to a list; one of the most important differences is that tuples can be used as keys in dictionaries and as elements of sets, while lists cannot. Here is a trivial example:

In [17]:
d = {(x, x + 1): x for x in range(10)}  # Create a dictionary with tuple keys
t = (5, 6)        # Create a tuple
print(type(t))    # Prints "<class 'tuple'>"
print(d[t])       # Prints "5"
print(d[(1, 2)])  # Prints "1"

<class 'tuple'>
5
1


<h2>Functions</h2>
Python functions are defined using the def keyword. For example:

In [18]:
def sign(x):
    if x > 0:
        return 'positive'
    elif x < 0:
        return 'negative'
    else:
        return 'zero'

for x in [-1, 0, 1]:
    print(sign(x))
# Prints "negative", "zero", "positive"

negative
zero
positive


We will often define functions to take optional keyword arguments, like this:

In [19]:
def hello(name, loud=False):
    if loud:
        print('HELLO, %s!' % name.upper())
    else:
        print('Hello, %s' % name)

hello('Bob') # Prints "Hello, Bob"
hello('Fred', loud=True)  # Prints "HELLO, FRED!"

Hello, Bob
HELLO, FRED!


<h2>Classes</h2>
The syntax for defining classes in Python is straightforward:

In [20]:
class Greeter(object):

    # Constructor
    def __init__(self, name):
        self.name = name  # Create an instance variable

    # Instance method
    def greet(self, loud=False):
        if loud:
            print('HELLO, %s!' % self.name.upper())
        else:
            print('Hello, %s' % self.name)

g = Greeter('Fred')  # Construct an instance of the Greeter class
g.greet()            # Call an instance method; prints "Hello, Fred"
g.greet(loud=True)   # Call an instance method; prints "HELLO, FRED!"


Hello, Fred
HELLO, FRED!


<h2>Numpy</h2>
Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays. If you are already familiar with MATLAB, you might find this tutorial useful to get started with Numpy.


<h2>Arrays</h2>
A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.

We can initialize numpy arrays from nested Python lists, and access elements using square brackets:

In [21]:
import numpy as np

a = np.array([1, 2, 3])   # Create a rank 1 array
print(type(a))            # Prints "<class 'numpy.ndarray'>"
print(a.shape)            # Prints "(3,)"
print(a[0], a[1], a[2])   # Prints "1 2 3"
a[0] = 5                  # Change an element of the array
print(a)                  # Prints "[5, 2, 3]"

b = np.array([[1,2,3],[4,5,6]])    # Create a rank 2 array
print(b.shape)                     # Prints "(2, 3)"
print(b[0, 0], b[0, 1], b[1, 0])   # Prints "1 2 4"

<class 'numpy.ndarray'>
(3,)
1 2 3
[5 2 3]
(2, 3)
1 2 4


Numpy also provides many functions to create arrays:

In [22]:
import numpy as np

a = np.zeros((2,2))   # Create an array of all zeros
print(a)              # Prints "[[ 0.  0.]
                      #          [ 0.  0.]]"

b = np.ones((1,2))    # Create an array of all ones
print(b)              # Prints "[[ 1.  1.]]"

c = np.full((2,2), 7)  # Create a constant array
print(c)               # Prints "[[ 7.  7.]
                       #          [ 7.  7.]]"

d = np.eye(2)         # Create a 2x2 identity matrix
print(d)              # Prints "[[ 1.  0.]
                      #          [ 0.  1.]]"

e = np.random.random((2,2))  # Create an array filled with random values
print(e)                     # Might print "[[ 0.91940167  0.08143941]
                             #               [ 0.68744134  0.87236687]]"

[[0. 0.]
 [0. 0.]]
[[1. 1.]]
[[7 7]
 [7 7]]
[[1. 0.]
 [0. 1.]]
[[0.42361111 0.20305681]
 [0.704872   0.29104113]]


<h2>Array indexing</h2>
Numpy offers several ways to index into arrays.

Slicing: Similar to Python lists, numpy arrays can be sliced. Since arrays may be multidimensional, you must specify a slice for each dimension of the array:

In [23]:
import numpy as np

# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
#  [6 7]]
b = a[:2, 1:3]

# A slice of an array is a view into the same data, so modifying it
# will modify the original array.
print(a[0, 1])   # Prints "2"
b[0, 0] = 77     # b[0, 0] is the same piece of data as a[0, 1]
print(a[0, 1])   # Prints "77"

2
77


You can also mix integer indexing with slice indexing. However, doing so will yield an array of lower rank than the original array. Note that this is quite different from the way that MATLAB handles array slicing:



In [24]:
import numpy as np

# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

# Two ways of accessing the data in the middle row of the array.
# Mixing integer indexing with slices yields an array of lower rank,
# while using only slices yields an array of the same rank as the
# original array:
row_r1 = a[1, :]    # Rank 1 view of the second row of a
row_r2 = a[1:2, :]  # Rank 2 view of the second row of a
print(row_r1, row_r1.shape)  # Prints "[5 6 7 8] (4,)"
print(row_r2, row_r2.shape)  # Prints "[[5 6 7 8]] (1, 4)"

# We can make the same distinction when accessing columns of an array:
col_r1 = a[:, 1]
col_r2 = a[:, 1:2]
print(col_r1, col_r1.shape)  # Prints "[ 2  6 10] (3,)"
print(col_r2, col_r2.shape)  # Prints "[[ 2]
                             #          [ 6]
                             #          [10]] (3, 1)"

[5 6 7 8] (4,)
[[5 6 7 8]] (1, 4)
[ 2  6 10] (3,)
[[ 2]
 [ 6]
 [10]] (3, 1)


Integer array indexing: When you index into numpy arrays using slicing, the resulting array view will always be a subarray of the original array. In contrast, integer array indexing allows you to construct arbitrary arrays using the data from another array. Here is an example:

In [25]:
import numpy as np

a = np.array([[1,2], [3, 4], [5, 6]])

# An example of integer array indexing.
# The returned array will have shape (3,) and
print(a[[0, 1, 2], [0, 1, 0]])  # Prints "[1 4 5]"

# The above example of integer array indexing is equivalent to this:
print(np.array([a[0, 0], a[1, 1], a[2, 0]]))  # Prints "[1 4 5]"

# When using integer array indexing, you can reuse the same
# element from the source array:
print(a[[0, 0], [1, 1]])  # Prints "[2 2]"

# Equivalent to the previous integer array indexing example
print(np.array([a[0, 1], a[0, 1]]))  # Prints "[2 2]"

[1 4 5]
[1 4 5]
[2 2]
[2 2]


One useful trick with integer array indexing is selecting or mutating one element from each row of a matrix:

In [26]:
import numpy as np

# Create a new array from which we will select elements
a = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])

print(a)  # prints "array([[ 1,  2,  3],
          #                [ 4,  5,  6],
          #                [ 7,  8,  9],
          #                [10, 11, 12]])"

# Create an array of indices
b = np.array([0, 2, 0, 1])

# Select one element from each row of a using the indices in b
print(a[np.arange(4), b])  # Prints "[ 1  6  7 11]"

# Mutate one element from each row of a using the indices in b
a[np.arange(4), b] += 10

print(a)  # prints "array([[11,  2,  3],
          #                [ 4,  5, 16],
          #                [17,  8,  9],
          #                [10, 21, 12]])

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[ 1  6  7 11]
[[11  2  3]
 [ 4  5 16]
 [17  8  9]
 [10 21 12]]


Boolean array indexing: Boolean array indexing lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition. Here is an example:

In [27]:
import numpy as np

a = np.array([[1,2], [3, 4], [5, 6]])

bool_idx = (a > 2)   # Find the elements of a that are bigger than 2;
                     # this returns a numpy array of Booleans of the same
                     # shape as a, where each slot of bool_idx tells
                     # whether that element of a is > 2.

print(bool_idx)      # Prints "[[False False]
                     #          [ True  True]
                     #          [ True  True]]"

# We use boolean array indexing to construct a rank 1 array
# consisting of the elements of a corresponding to the True values
# of bool_idx
print(a[bool_idx])  # Prints "[3 4 5 6]"

# We can do all of the above in a single concise statement:
print(a[a > 2])     # Prints "[3 4 5 6]"

[[False False]
 [ True  True]
 [ True  True]]
[3 4 5 6]
[3 4 5 6]


<h2>Datatypes</h2>
Every numpy array is a grid of elements of the same type. Numpy provides a large set of numeric datatypes that you can use to construct arrays. Numpy tries to guess a datatype when you create an array, but functions that construct arrays usually also include an optional argument to explicitly specify the datatype. Here is an example:

In [28]:
import numpy as np

x = np.array([1, 2])   # Let numpy choose the datatype
print(x.dtype)         # Prints "int64"

x = np.array([1.0, 2.0])   # Let numpy choose the datatype
print(x.dtype)             # Prints "float64"

x = np.array([1, 2], dtype=np.int64)   # Force a particular datatype
print(x.dtype)                         # Prints "int64"

int64
float64
int64


<h2>Array math</h2>
Basic mathematical functions operate elementwise on arrays, and are available both as operator overloads and as functions in the numpy module:

In [29]:
import numpy as np

x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

# Elementwise sum; both produce the array
# [[ 6.0  8.0]
#  [10.0 12.0]]
print(x + y)
print(np.add(x, y))

# Elementwise difference; both produce the array
# [[-4.0 -4.0]
#  [-4.0 -4.0]]
print(x - y)
print(np.subtract(x, y))

# Elementwise product; both produce the array
# [[ 5.0 12.0]
#  [21.0 32.0]]
print(x * y)
print(np.multiply(x, y))

# Elementwise division; both produce the array
# [[ 0.2         0.33333333]
#  [ 0.42857143  0.5       ]]
print(x / y)
print(np.divide(x, y))

# Elementwise square root; produces the array
# [[ 1.          1.41421356]
#  [ 1.73205081  2.        ]]
print(np.sqrt(x))

[[ 6.  8.]
 [10. 12.]]
[[ 6.  8.]
 [10. 12.]]
[[-4. -4.]
 [-4. -4.]]
[[-4. -4.]
 [-4. -4.]]
[[ 5. 12.]
 [21. 32.]]
[[ 5. 12.]
 [21. 32.]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[1.         1.41421356]
 [1.73205081 2.        ]]


Note that unlike MATLAB, * is elementwise multiplication, not matrix multiplication. We instead use the dot function to compute inner products of vectors, to multiply a vector by a matrix, and to multiply matrices. dot is available both as a function in the numpy module and as an instance method of array objects:



In [30]:
import numpy as np

x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])

v = np.array([9,10])
w = np.array([11, 12])

# Inner product of vectors; both produce 219
print(v.dot(w))
print(np.dot(v, w))

# Matrix / vector product; both produce the rank 1 array [29 67]
print(x.dot(v))
print(np.dot(x, v))

# Matrix / matrix product; both produce the rank 2 array
# [[19 22]
#  [43 50]]
print(x.dot(y))
print(np.dot(x, y))

219
219
[29 67]
[29 67]
[[19 22]
 [43 50]]
[[19 22]
 [43 50]]


Numpy provides many useful functions for performing computations on arrays; one of the most useful is sum:



In [31]:
import numpy as np

x = np.array([[1,2],[3,4]])

print(np.sum(x))  # Compute sum of all elements; prints "10"
print(np.sum(x, axis=0))  # Compute sum of each column; prints "[4 6]"
print(np.sum(x, axis=1))  # Compute sum of each row; prints "[3 7]"

10
[4 6]
[3 7]


Apart from computing mathematical functions using arrays, we frequently need to reshape or otherwise manipulate data in arrays. The simplest example of this type of operation is transposing a matrix; to transpose a matrix, simply use the T attribute of an array object:

In [32]:
import numpy as np

x = np.array([[1,2], [3,4]])
print(x)    # Prints "[[1 2]
            #          [3 4]]"
print(x.T)  # Prints "[[1 3]
            #          [2 4]]"

# Note that taking the transpose of a rank 1 array does nothing:
v = np.array([1,2,3])
print(v)    # Prints "[1 2 3]"
print(v.T)  # Prints "[1 2 3]"

[[1 2]
 [3 4]]
[[1 3]
 [2 4]]
[1 2 3]
[1 2 3]


<h2>Broadcasting</h2>
Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations. Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.

For example, suppose that we want to add a constant vector to each row of a matrix. We could do it like this:

In [34]:
import numpy as np

# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = np.empty_like(x)   # Create an empty matrix with the same shape as x

# Add the vector v to each row of the matrix x with an explicit loop
for i in range(4):
    y[i, :] = x[i, :] + v

# Now y is the following
# [[ 2  2  4]
#  [ 5  5  7]
#  [ 8  8 10]
#  [11 11 13]]
print(y)

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


This works; however when the matrix x is very large, computing an explicit loop in Python could be slow. Note that adding the vector v to each row of the matrix x is equivalent to forming a matrix vv by stacking multiple copies of v vertically, then performing elementwise summation of x and vv. We could implement this approach like this:

In [36]:
import numpy as np

# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
vv = np.tile(v, (4, 1))   # Stack 4 copies of v on top of each other
print(vv)                 # Prints "[[1 0 1]
                          #          [1 0 1]
                          #          [1 0 1]
                          #          [1 0 1]]"
y = x + vv  # Add x and vv elementwise
print(y)  # Prints "[[ 2  2  4
          #          [ 5  5  7]
          #          [ 8  8 10]
          #          [11 11 13]]"

[[1 0 1]
 [1 0 1]
 [1 0 1]
 [1 0 1]]
[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


Numpy broadcasting allows us to perform this computation without actually creating multiple copies of v. Consider this version, using broadcasting:

In [37]:
import numpy as np

# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = x + v  # Add v to each row of x using broadcasting
print(y)  # Prints "[[ 2  2  4]
          #          [ 5  5  7]
          #          [ 8  8 10]
          #          [11 11 13]]"

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


The line y = x + v works even though x has shape (4, 3) and v has shape (3,) due to broadcasting; this line works as if v actually had shape (4, 3), where each row was a copy of v, and the sum was performed elementwise.

Broadcasting two arrays together follows these rules:

If the arrays do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.
The two arrays are said to be compatible in a dimension if they have the same size in the dimension, or if one of the arrays has size 1 in that dimension.
The arrays can be broadcast together if they are compatible in all dimensions.
After broadcasting, each array behaves as if it had shape equal to the elementwise maximum of shapes of the two input arrays.
In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension


Here are some applications of broadcasting:

In [38]:
import numpy as np

# Compute outer product of vectors
v = np.array([1,2,3])  # v has shape (3,)
w = np.array([4,5])    # w has shape (2,)
# To compute an outer product, we first reshape v to be a column
# vector of shape (3, 1); we can then broadcast it against w to yield
# an output of shape (3, 2), which is the outer product of v and w:
# [[ 4  5]
#  [ 8 10]
#  [12 15]]
print(np.reshape(v, (3, 1)) * w)

# Add a vector to each row of a matrix
x = np.array([[1,2,3], [4,5,6]])
# x has shape (2, 3) and v has shape (3,) so they broadcast to (2, 3),
# giving the following matrix:
# [[2 4 6]
#  [5 7 9]]
print(x + v)

# Add a vector to each column of a matrix
# x has shape (2, 3) and w has shape (2,).
# If we transpose x then it has shape (3, 2) and can be broadcast
# against w to yield a result of shape (3, 2); transposing this result
# yields the final result of shape (2, 3) which is the matrix x with
# the vector w added to each column. Gives the following matrix:
# [[ 5  6  7]
#  [ 9 10 11]]
print((x.T + w).T)
# Another solution is to reshape w to be a column vector of shape (2, 1);
# we can then broadcast it directly against x to produce the same
# output.
print(x + np.reshape(w, (2, 1)))

# Multiply a matrix by a constant:
# x has shape (2, 3). Numpy treats scalars as arrays of shape ();
# these can be broadcast together to shape (2, 3), producing the
# following array:
# [[ 2  4  6]
#  [ 8 10 12]]
print(x * 2)


[[ 4  5]
 [ 8 10]
 [12 15]]
[[2 4 6]
 [5 7 9]]
[[ 5  6  7]
 [ 9 10 11]]
[[ 5  6  7]
 [ 9 10 11]]
[[ 2  4  6]
 [ 8 10 12]]


<h2>SciPy</h2>
Numpy provides a high-performance multidimensional array and basic tools to compute with and manipulate these arrays. SciPy builds on this, and provides a large number of functions that operate on numpy arrays and are useful for different types of scientific and engineering applications.
<h2>Image operations</h2>
SciPy provides some basic functions to work with images. For example, it has functions to read images from disk into numpy arrays, to write numpy arrays to disk as images, and to resize images. Here is a simple example that showcases these functions:



In [42]:
from scipy.misc import imread, imsave, imresize, imshow

# Read an JPEG image into a numpy array
img = imread('cat.jpeg')
print(img.dtype, img.shape)  # Prints "uint8 (400, 248, 3)"

# We can tint the image by scaling each of the color channels
# by a different scalar constant. The image has shape (400, 248, 3);
# we multiply it by the array [1, 0.95, 0.9] of shape (3,);
# numpy broadcasting means that this leaves the red channel unchanged,
# and multiplies the green and blue channels by 0.95 and 0.9
# respectively.
img_tinted = img * [1, 0.95, 0.9]

# Resize the tinted image to be 300 by 300 pixels.
img_tinted = imresize(img_tinted, (300, 300))

# Write the tinted image back to disk
imsave('cat_tinted.jpg', img_tinted)

uint8 (331, 500, 3)


`imread` is deprecated in SciPy 1.0.0, and will be removed in 1.2.0.
Use ``imageio.imread`` instead.
  after removing the cwd from sys.path.
`imresize` is deprecated in SciPy 1.0.0, and will be removed in 1.3.0.
Use Pillow instead: ``numpy.array(Image.fromarray(arr).resize())``.
  app.launch_new_instance()
`imsave` is deprecated in SciPy 1.0.0, and will be removed in 1.2.0.
Use ``imageio.imwrite`` instead.


In [43]:
imshow(img)

`imshow` is deprecated in SciPy 1.0.0, and will be removed in 1.2.0.
Use ``matplotlib.pyplot.imshow`` instead.
  """Entry point for launching an IPython kernel.
