# 1. Introduction

## What is Python?

Python is an incredibly powerful high-level, general-purpose programming language built in 1989. It is free and open source. Here is a look at some of the most famous applications of Python:

![trend](https://qph.fs.quoracdn.net/main-qimg-5b7953f673dc0823b3e09179a7804109)

Python is a full fledged programming language that is mostly object oriented but can used in powerful and efficient ways without OOP. It is used in multiple applications including (but not limited to)
1. Web Development 
2. Database access 
3. Scientific and Numeric computing 
4. Game Development 


## Python for Data Science
When it comes to data science, Python is a very powerful tool, which is also open sourced and flexible, adding more to its popularity. It is known to have massive libraries for manipulation of data and is extremely easy to learn and use for all forms of data analysis. 

Post 2016, Python has overtaken R, and is the leader in Data Science, Machine Learning platforms
![trend](https://www.kdnuggets.com/images/google-trends-python-r-data-science-machine-learning-2012-2017.jpg) 



#### Kaggle 2016 Year Summary
In past years, R was the language of choice on Kaggle, but 2016 has seen Python emerge as a clear winner when it came to the number of kernels written. 
![classification](https://www.kdnuggets.com/images/kaggle-python-vs-r-kernels-2016.jpg)

Python for Data Science has developed rapidly and is now the preferred language of choice for Data Scientists (along with R) the world over because of 
1. The package system
2. The libraries developed for Machine Learning 
3. Ability to do other tasks apart from Data Science and the convenience in Integration
4. Of course - Open Source

## Python vs R vs SAS

#### SAS: 
SAS has been the undisputed market leader in commercial analytics space. The software offers huge array of statistical functions, has good GUI (Enterprise Guide & Miner) for people to learn quickly and provides awesome technical support. However, it ends up being the most expensive option and is not always enriched with latest statistical functions.

#### R:
R is the Open source counterpart of SAS, which has traditionally been used in academics and research. Because of its open source nature, latest techniques get released quickly. There is a lot of documentation available over the internet and it is a very cost-effective option.

#### Python: 
With it origin as an open source scripting language, Python usage has grown over time. Today, it has libraries (numpy, scipy, matplotlib, etc.) and functions for almost any statistical operation / model building you may want to do. Since introduction of pandas, it has become very strong in operations on structured data.

## Python 2 vs Python 3

#### Python 3
Python 3 is regarded as the future of Python and is the version of the language that is currently in development. A major overhaul, Python 3 was released in late 2008 to address and amend intrinsic design flaws of previous versions of the language. The focus of Python 3 development was to clean up the codebase and remove redundancy, making it clear that there was only one way to perform a given task.

Major modifications to Python 3.0 included changing the print statement into a built-in function, improve the way integers are divided, and providing more Unicode support.

At first, Python 3 was slowly adopted due to the language not being backwards compatible with Python 2, requiring people to make a decision as to which version of the language to use. Additionally, many package libraries were only available for Python 2, but as the development team behind Python 3 has reiterated that there is an end of life for Python 2 support, more libraries have been ported to Python 3. The increased adoption of Python 3 can be shown by the number of Python packages that now provide Python 3 support, which at the time of writing includes 339 of the 360 most popular Python packages.

#### Python 2.7
Following the 2008 release of Python 3.0, Python 2.7 was published on July 3, 2010 and planned as the last of the 2.x releases. The intention behind Python 2.7 was to make it easier for Python 2.x users to port features over to Python 3 by providing some measure of compatibility between the two. This compatibility support included enhanced modules for version 2.7 like unittest to support test automation, argparse for parsing command-line options, and more convenient classes in collections.

Because of Python 2.7’s unique position as a version in between the earlier iterations of Python 2 and Python 3.0, it has persisted as a very popular choice for programmers due to its compatibility with many robust libraries. When we talk about Python 2 today, we are typically referring to the Python 2.7 release as that is the most frequently used version.

Python 2.7, however, is considered to be a legacy language and its continued development, which today mostly consists of bug fixes, will cease completely in 2020.

# 2. Getting Started

## Package Managers

#### Pip
Pip is python’s package manager. It has come built-in to Python for quite a while now, so if you have Python, you likely have pip installed already.

**pip install library_name**

In [None]:
!pip install math

There are currently two popular options for taking care of managing your different pip packages:
* virtualenv 
* anaconda

### Virtualenv
Virtualenv is a package that allows you to create named “virtual environments”, where you can install pip packages in an isolated manner.

### Anaconda
Now, if you are primarily doing data science work, Anaconda is also a great option. It is a Python distribution that comes preinstalled with lots of useful python libraries for data science.

Anaconda is popular because it brings many of the tools used in data science and machine learning with just one install, so it’s great for having short and simple setup.

Like Virtualenv, Anaconda also uses the concept of creating environments so as to isolate different libraries and versions. Anaconda also introduces its own package manager, called conda, from where you can install libraries.

Additionally, Anaconda still has the useful interaction with pip that allows you to install any additional libraries which are not available in the Anaconda package manager.

### pyenv
We can opt to use both, and manage the whole thing using a library called *pyenv*. Conceptually, pyenv sits atop both virtualenv and anaconda, and it can be used to control not only which virtualenv environment or anaconda environment is in use, but it also easily controls whether you are running Python 2 or Python 3.

Another aspect of pyenv that is the ability to set a default environment for a given directory. This causes the desired environment to be automatically activated when you enter a directory. This a lot easier than trying to remember which environment you want to use every time you work on a project.


## Importing modules and packages in Python

There are a number of modules that are built into the Python Standard Library, which contains many modules that provide access to system functionality or provide standardized solutions. The Python Standard Library is part of every Python installation.

From within the interpreter you can run the import statement to make sure that the given module is ready to be called. For example:

**import math**

Since math is a built-in module, your interpreter should complete the task with no feedback, returning to the prompt. This means you don’t need to do anything to start using the *math* module.

If a required module is not installed, you’ll receive an error like this:

**ImportError: No module named 'matplotlib'**

In such a case, we can use pip to install the required module.


In [None]:
import math

#### Aliasing Modules
It is possible to modify the names of modules and their functions within Python by using the as keyword.
The construction of this statement looks like this:

**import [module] as [another_name]**

In [None]:
import math as m

print(m.pi)
print(m.e)

In [None]:
dir(math)

# 3. Basic data types and operations in Python

In [None]:
#Enables multiple outputs from each cell 
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [None]:
#The most basic use of Python is that it can be used as a calculator
2+5

In [None]:
40-5*6

In [None]:
# Divison always returns a floating point number in Python 3
8/5

In [None]:
17.0 / 3  # classic division returns a float or integer (depending on the version of Python used)
17 // 3  # floor division discards the fractional part
17 % 3  # the % operator returns the remainder of the division
5 * 3 + 2  # result * divisor + remainder
2 ** 7  # 2 to the power of 7
4 * 3.75 - 1 # Integers are converted to floating point (when mixed operands)

## Data Types - Numeric - Integer, Floating point decimals and complex number



In [None]:
#Variable assignment
a=10
b=74.92

print(a)
print(b)

In [None]:
type(a)
type(b)

In [None]:
#Change a number to a string
b_string=str(b)
print(b_string)
type(b_string)

In [None]:
#Example of a complex number
z=2+3j

In [None]:
#Extracting the real and imaginary part
print(z.real)
print(z)
z.conjugate()

## Strings

In [None]:
#Create a string
# Single and Double quotes are the same 

str_1 = 'ABInBev'  # single quotes
str_2 = "ABInBev"  # double quotes

print(str_1)
print(str_2)

'doesn\'t'  # use \' to escape the single quote...

"doesn't"  # ...or use double quotes instead (to enclose single quotes)

In [None]:
# Strings can be concatenated with + 

3 * 'Ops' + 'Analytics'

In [None]:
# Strings can be indexed
# Remember, Python indexes start from 0

word = 'AB-InBev'
word[0]  # character in position 0

word[4]  # character in position 4

In [None]:
# Negative indices also work

word[-1]  # last character

word[-2]  # second-last character

word[-6]

In [None]:
# Slicing allows you to obtain substrings
# word = 'Python'

word[0:2]  # characters from position 0 (included) to 2 (excluded)

word[2:5]  # characters from position 2 (included) to 5 (excluded)

In [None]:
# s[:i] + s[i:] is always equal to s

word[:2] + word[2:]

word[:4] + word[4:]

In [None]:
# Python strings are immutable. Cannot be changed

word[2]=' '

word[3:] = 'AmBev'

In [None]:
# Can create new strings though

'B' + word[1:]

word[:3] + 'AmBev'

In [None]:
# len() returns length of string 

s = 'Anheuser-Busch InBev'
len(s)

### Printing

In [None]:
x = 'hello'

print(x)

In [None]:
num = 12
name = 'John'

print('My number is: {}, and my name is: {}'.format(num,name))
print('My number is: {one}, and my name is: {two}'.format(one=num,two=name))

### Exercises

In [None]:
#Using the given string, return
#1. length of the string
#2. the count of  spaces in the string
#3. index where a substring - "ABI" is matched (where the match ends)
#4. convert the entire string to Lowercase
#5. replace ABI with AB InBev.
#6. change the cases of all the alphabets in string i.e., make all capitalized letters into small case and capitalize all small 
#   case letters
#7. Find the index of the first and last occurrence of "-"
#8. Using .format, print your first name at the beginning and end of the string

string = " - ABI - We're Here For The Beer - "


### Lists 

A list is a collection which is ordered and changeable. In Python lists are written with square brackets.

`list.append(x)`
Add an item to the end of the list. Equivalent to `a[len(a):] = [x]`.

`list.extend(iterable)`
Extend the list by appending all the items from the iterable. Equivalent to a[len(a):] = iterable.

`list.insert(i, x)`
Insert an item at a given position. The first argument is the index of the element before which to insert, so a.insert(0, x) inserts at the front of the list, and a.insert(len(a), x) is equivalent to a.append(x).

`list.remove(x)`
Remove the first item from the list whose value is x. It is an error if there is no such item.

`list.pop([i])`
Remove the item at the given position in the list, and return it. If no index is specified, a.pop() removes and returns the last item in the list. (The square brackets around the i in the method signature denote that the parameter is optional, not that you should type square brackets at that position. You will see this notation frequently in the Python Library Reference.)

`list.clear()`
Remove all items from the list. Equivalent to del a[:].

`list.index(x[, start[, end]])`
Return zero-based index in the list of the first item whose value is x. Raises a ValueError if there is no such item.

The optional arguments start and end are interpreted as in the slice notation and are used to limit the search to a particular subsequence of the list. The returned index is computed relative to the beginning of the full sequence rather than the start argument.

`list.count(x)`
Return the number of times x appears in the list.

`list.sort(key=None, reverse=False)`
Sort the items of the list in place (the arguments can be used for sort customization, see sorted() for their explanation).

`list.reverse()`
Reverse the elements of the list in place.

`list.copy()`
Return a shallow copy of the list. Equivalent to a[:].



In [None]:
#Create a list
brand=['budweiser','corona','becks','hoegaarden','leffe']

#Count the occurances of 'motor'
brand.count('becks')

#Append 'marine' to the list
brand.append('corona')

#Return index
print(brand.index('hoegaarden'))

print(brand)

In [None]:
# Lists are mutable - Can be changed 
# Slicing a list results in new lists 

squares = [1, 4, 9, 16, 25]
squares.append(35)
print(squares)
squares[5] = 36
squares[-3:] 

In [None]:
# Assignment to slices is also possible 

letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
letters

# replace some values
letters[2:5] = ['C', 'D', 'E']
letters

# now remove them
letters[2:5] = []
letters

# clear the list by replacing all the elements with an empty list
letters[:] = []
letters

In [None]:
# Nested lists can be created

a = ['a', 'b', 'c']
n = [1, 2, 3]

x = [a, n]

print(x)
print(x[0])

print(x[0][1])
print(x[1][2])

y = a
y.extend(n)

print(y)

In [None]:
#Using the brand list, try out the following

#1. add the string 'brahma' at end of the item-'becks'
#2. reverse and sort the list
#3. return the position / index of the second occurrence of "corona" without hard-coding the position of the first
#4. remove the last element from the list and display it as an output
#5. remove the first element from the list and display it as an output

### Tuples 

* Tuples are sequence data types like lists
* They are immutable 
* They just consists of values separated by commas
* Whereas a list (usually) consists of homogenous data type elements, tuples are (usually) used to store heterogenous data types 

In [None]:
t = 12345, 54321, 'hello!'
t[0]

t

# Tuples may be nested:
u = t, (1, 2, 3, 4, 5)
u

# Tuples are immutable:
#t[0] = 88888

# but they can contain mutable objects:
v = ([1, 2, 3], [3, 2, 1])
v = [(1, 2, 3), (3, 2, 1)]
v[0][1]

In [None]:
# Can a tuple be sorted?
tup = (3, 6, 8, 2, 78, 1, 23, 45, 9)
type(tup)
sorted(tup)
tup
tup = sorted(tup)
tup
type(tup)

### Dictionaries 

* Unlike lists that are indexed by integers, dictionaires are indexed using keys
* Keys can be any immutable data type. Strings and numbers are good keys
* Dictionaries can be thought of as an unordered set of key:value pairs
* Primary purpose of a dictionary is to extract value corresponding to a key and storing the value of a key

Let us look at some of the functions that work on a dictionary

In [None]:
tel = {'jack': 4098, 'sape': 4139}
tel['guido'] = 4127
tel

tel['jack']

del tel['sape']
tel['irv'] = 4127
tel

list(tel.keys())

sorted(tel.keys())

'guido' in tel

'jack' in tel

In [None]:
# dict() constructor can be used to create dictionaries 

dict([('sape', [1,2]), ('guido', 4127), ('jack', 4098)])
a=dict(sape=[0,9,9], guido=4127, jack=4098)


In [None]:
#Below is the dictionary which maps english words with their German equivalents
German ={"Pen":"Kuli","Pencil":"Bleistift" , "Glue":"Klebstift","Calculator":"Taschenrechner","Sharpener":"Spitzer","Scissors":"Schere","Cassette":"Kassette"}
#Remember dictionary requires a curly bracket, this comes from the concept of key-value pairs which are used in JSON formats as well

#Write a code to:
#1. Extract all the keys
#2. Get the German equivalent of 'Glue'
#3. Create a new dictionary - ("Ruler":"Lineal","Book":"Buch","Dictionary":"Worterbuch"),
#   Using this update the German dictionary

print(German)


### Sets
* A set is a collection of unique elements

In [None]:
 s= {1,2,3}
type(s)
s

In [None]:
{1,2,3,1,2,1,2,3,3,3,3,2,2,2,1,1,2}

In [None]:
set([1,1,1,2,2,2,3,3,3,4,4,5]) # The unique elements of a list can be obtained by passing the list to the set function

In [None]:
print(s)
s.add(4) # Items can be added to a set by calling the add method on the set
print(s)

To summarise, there are four **collection data types** in the Python programming language:

- **List** is a collection which is ordered and changeable. Allows duplicate members.
- **Tuple** is a collection which is ordered and unchangeable. Allows duplicate members.
- **Dictionary** is a collection which is unordered, changeable and indexed. No duplicate members.
- **Se**t is a collection which is unordered and unindexed. No duplicate members.


### Functions

In [None]:
# Functions are defined with the keyword def
def my_func(name,sec_name):
    print('Hello',name,sec_name)

In [None]:
my_func(sec_name='John',name='Doe')

In [None]:
# If you want a default value for the parameter, you can use the following syntax
def my_func(name='Default Name'):
    print('Hello ' +name)

In [None]:
# Now if you do not pass any parameter, you would get the default value as an output
my_func()
my_func('John')

In [None]:
# Functions to return a value
# Functions can have documentation strings. These can be added by putting a set of triple enclosing quotes 
def square(num):
    """
    This is a docstring.
    It can go over multiple lines.
    This function squares a number.
    """
    return num**2

In [None]:
output = square(3)
output

### Map function
* The map function lets you apply a function to every element of a list

In [None]:
def times2(var):
    return var*2

In [None]:
times2(5)

In [None]:
seq = [1,2,3,4,5]

In [None]:
# You could call the function in a for loop and append the results to a list or use the map function. The map function
# let's you pass a function and the sequence you want to map that function to.

list(map(times2,seq))


### Lambda expression
* In Python, an anonymous function is a function that is defined without a name. 
* While normal functions are defined using the def keyword, in Python anonymous functions are defined using the lambda keyword. 
* Hence, anonymous functions are also called lambda functions.
* It essentially gets rid of the obvious parts of defining, naming and returning.

In [None]:
# The above function can be written in a single line as follow:
def times2(var):return var*2

In [None]:
# The keyword lambda replaces the keyword def and the name of the function and the return parameter directly follows
# the colon. Hence, the above function can be written as follows:

lambda var:var*2

t = lambda var:var*2
t(6)

In [None]:
# The lambda function is generally used like this:
list(map(lambda var:var*2,seq))

In [None]:
# Try out
# Using range & map functions and a lambda expression, return a list that has the squares of all the numbers 
# from 1 to 10

### Filter function
* It has a very similar structure to map but instead of mapping a function to every element of a sequence, it filters out elements from a sequence.
* To filter, a function or a lambda function is passed which returns a boolean value.

In [None]:
list(filter(lambda num:num%2 == 0,seq))

In [None]:
# Try out
# Using range & filter functions and a lambda expression, return a list containing the multiples of 3 from 3 to 30.

### Methods
* Methods are calls you can make of an object that will affect the object and return a result.

In [None]:
# Some important string methods

s = 'Hello my name is Sam'

s.lower() # Will convert every alphabet in a string to lowercase
s.upper() # Will convert every alphabet in a string to uppercase

s.split() # By default, it will split the string from the whitespaces in the string.

tweet = 'Go Liverpool! #Liverpool'
tweet.split('#')

In [None]:
x=['a','b']
'|'.join(x)

In [None]:
# Some useful dictionary metods

d = {'k1':1,'k2':2}

d.keys() # Returns the keys of the dictionary
d.items() # Returns the dictionary items
d.values() # Returns the values