## Table of Contents
* [Why learn Python for data analysis?](#Why-learn-Python-for-data-analysis?)
* [Python Data Structures](#Python-Data-Structures)
* [Conditional and iterative statements](#Conditional-and-iterative-statements)
* [Loading data](#Loading-data)
* [Understand pandas dataframes](#Understand-pandas-dataframes)

### Why learn Python for data analysis?

Python has gained a lot of interest recently as the main language for data analysis. In comparison against SAS & R, here are some reasons which go in favour of learning Python:

* Open Source – free to install
* Awesome online community
* Easy to learn
* Possibility to become the common language for data science and production of web based analytics products.

Needless to say, it still has few drawbacks too:

* It is an interpreted language rather than compiled language – hence might take up more CPU time. However, given the savings in programmer time due to ease of learning, it might still be a good choice.
* Python is still not used in client side of softwares, such as mobile phones. Instead faster and more efficient languages are preferred. 

Let's do a simple addition in jupyter notebook. 

*Note: "#" operator in python is used to comment a line. This line is for a programmer to read and does not contribute in the actual programming*

In [None]:
# Add 2 & 2 and assign it to "addition" variable
addition = 2 + 2 

In [None]:
print(addition) # print addition

In jupyter notebook, you can automatically print a variable by just typing its name.For example:

In [None]:
addition

Note: You should still use print command to control what should be printed and what shouldn't be printed. 

In [None]:
# multiply 4 & 4 and assign it to "multiplication"
multiplication = 4 * 4

In [None]:
print(multiplication) # print multiplication

**Exercise**

Q1 Add two numbers 3 and 4, then assign it to "answer" variable. 

Q2 Divide two numbers 6 and 3, then print out the solution. 

### Python Data Structures

Following are some data structures, which are used in Python. You should be familiar with these in order to use them appropriately.

#### Lists
Lists are one of the most versatile data structure in Python. A list can simply be defined by writing a list of comma separated values in square brackets. Lists might contain items of different types, but usually the items all have the same type. Python lists are mutable and individual elements of a list can be changed.

Here is a quick example on how to define a list and then access it. 

A list  can be simply defined by writing comma separated values in square brakets. 

In [None]:
square_list = [0, 1, 4, 9, 16, 25] # define a list

In [None]:
print(square_list) # print square_list

Individual elements of a list can be accessed by writing index number in square bracket. Please note that the first index of list starts with 0 and not 1. 

In [None]:
print(square_list[0]) # print first element of list

A range of list can be accessed by having first and last index. 

In [None]:
print(square_list[2:4]) # slice square_list. 

You can see here that the first index is included whereas last index is excluded. 

--------

A negative index accesses the list from end. 

In [None]:
print(square_list[-2]) # print second last element in the list

#### Strings 
Strings can simply be defined by use of single ( ' ), double ( " ) or triple ( ''' ) inverted commas. Strings enclosed in triple quotes ( ''' ) can span over multiple lines and are used frequently in docstrings (Python’s way of documenting functions). \ is used as an escape character. Please note that Python strings are immutable, so you cannot change part of the strings.

A string can be simply defined by using single (') or double (") quotations

In [None]:
greeting = "Hello"              # assign a string 
print (greeting[1])             # return character at index 1
print (len(greeting))           # print length of string
print (greeting + "World")      # string concatenation

Raw strings can be used to pass on the string as it is. Python interpreter does not alter the string, if you specify it to be raw. Raw strings can be defined by adding "r" before the string

In [None]:
stmt = r'\n is a newline character by default'
print (stmt)

Python strings are immutable. This means that it can't be changed, any changes in the string will result in an error.

In [None]:
greeting[1] = 'i'


#### Dictionary
Dictionary is an unordered set of key: value pairs, with the requirement that the keys are unique (within one dictionary). A pair of braces creates an empty dictionary: { }


In [None]:
d = {'One': 1}

Here, "One" is the key and "1" is value of dictionary "d"

We can add one key value pair as:

In [None]:
extensions = {'Kunal': 9073, 'Tavish': 9128, 'Sunil': 9223, 'Nitin': 9330}
print(extensions)

In [None]:
extensions['Mukesh'] = 9410

In [None]:
print ('Before: ', extensions['Mukesh'])
extensions['Mukesh'] = 9150
print ('After: ', extensions['Mukesh'])

In [None]:
print(extensions.keys())

In [None]:
print(extensions.values())

**Exercise**

Q1. Make a list of names "Alpha", "Beta", "Gamma", "Theta" & "Omega".Assign it to "names" variable, then print out the fourth name from the beginning. 

Q2. Create a dictionary with the keys, "Alpha", "Beta", "Gamma", "Theta" & "Omega" along with values in ascending order starting from 1; viz, 

    Alpha -> 1
    Beta -> 2
    Gamma -> 3
and so on

### Conditional and iterative statements

Coming to conditional statements, these are used to execute code fragments based on a condition. The most commonly used construct is if-else, with following syntax:

    if [condition]:
      __execution if true__
    else:
      __execution if false__
      
You can see that there is an indent (space) before "__execution if true__" and "__execution if false__" statement. This is necessary in Python to give indentation. If you don't indent correctly, it will give an error.

In [None]:
if 2%3 == 1:
print("yes")

As you can see, python interpreter is highlighting indentation error. 

------------

For instance, if we want to print whether the number N is even or odd:

    if N%2 == 0:
      print 'Even'
    else:
      print 'Odd'

Like most languages, Python also has a FOR-loop which is the most widely used method for iteration. It has a simple syntax:

    for i in [Python Iterable]:
      expression(i)

Here “Python Iterable” can be a list, tuple or other advanced data structures. Let’s take a look at a simple example, determining the factorial of a number.

In [None]:
for i in range(1,10):
      print (i)

You can see above, "10" is not printed because "range" excludes last index

**Exercise**

Q1. Print even numbers from the given list - "number_list". 

Let's go one step ahead in our journey to learn Python by getting acquainted with some useful libraries. The first step is obviously to learn to import them into our environment. There are several ways of doing so in Python:

In [None]:
import math as m

from math import *

In the first manner, we have defined an alias 'm' to library math. We can now use various functions from math library (e.g. factorial) by referencing it using the alias 'm.factorial()'.

In the second manner, you have imported the entire name space in math i.e. you can directly use factorial() without referring to math.

Following are a list of libraries, you will need for any scientific computations and data analysis in Python:

* **NumPy** stands for Numerical Python. The most powerful feature of NumPy is n-dimensional array. This library also contains basic linear algebra functions, Fourier transforms,  advanced random number capabilities and tools for integration with other low level languages like Fortran, C and C++
* **SciPy** stands for Scientific Python. SciPy is built on NumPy. It is one of the most useful library for variety of high level science and engineering modules like discrete Fourier transform, Linear Algebra, Optimization and Sparse matrices.
* **Matplotlib** for plotting vast variety of graphs, starting from histograms to line plots to heat plots.. You can use Pylab feature in ipython notebook (ipython notebook –pylab = inline) to use these plotting features inline. If you ignore the inline option, then pylab converts ipython environment to an environment, very similar to Matlab. You can also use Latex commands to add math to your plot.
* **Pandas** for structured data operations and manipulations. It is extensively used for data munging and preparation. Pandas were added relatively recently to Python and have been instrumental in boosting Python’s usage in data scientist community.
* **Scikit Learn** for machine learning. Built on NumPy, SciPy and matplotlib, this library contains a lot of effiecient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction.

### Loading data

Finally for today, let's take a look at how to load a dataset in python. There are many source in which data can be stored. In this session, we will load a dataset which is stored in csv format. For other formats, you can refer [this article](https://www.analyticsvidhya.com/blog/2017/03/read-commonly-used-formats-using-python/).

In [None]:
import pandas as pd  # import pandas

In [None]:
data = pd.read_csv('data.csv') # read file

In [None]:
# see only the first five rows
data.head(5)

### Understand pandas dataframes

A DataFrame in pandas is a tabular data structure comprised of rows and columns, similar to a spreadsheet or a database table. We will take a look at how to deal with a dataframe

In [None]:
# To access a column
data['Item_Identifier']

In [None]:
# To access multiple column
data[['Item_Identifier', 'Item_Weight']]

In [None]:
# To access a row
data.loc[0]

In [None]:
# To access multiple rows
data.loc[0:5]

In [None]:
# to access specific row and specific column
# for example, you have to extract 2nd row value for 3rd column
data.ix[1, 2]

In [None]:
# if we want to access only those rows when 'Item_type' is 'Dairy', we can do as follows
data[data.Item_Type == 'Dairy']

**Exercise**

Q1. Load "data.txt" file and print first 10 rows

That's all for today!
----------------------------

-------------------------------
