Declaring Variables  

In Python, a variable can be declared and then assigned a value without specifying data type. Python  automatically infers the variable type from values assigned to it. Also, the variable need not be assigned  a value of same type during its lifetime. A variable initialized with a value of one type (e.g., integer) can  later be re-assigned as value of a different type (e.g., string). This implicit conversion can increase productivity and code reusability.  

Python supports the following variable types:  

1. int – Integer type.  
2. float – Floating point numbers.  
3. bool – Booleans are subtypes of integers and assigned value using literals True and False.  
4. str – Textual data.  

Below is an example of creating a set of variables and assigning values of different data types, that is,  integer, float, Boolean, and string types. 


In [2]:
#Declaring variables
var1 = 2
var2 = 5.0
var3 = True
var4 = "Machine Learning"


In [3]:
print(var1,var2,var3,var4)

2 5.0 True Machine Learning


In [4]:
print("Value of var1 :", var1)
print("Value of var2 :", var2)
print("Value of var3 :", var3)
print("Value of var4 :", var4)

Value of var1 : 2
Value of var2 : 5.0
Value of var3 : True
Value of var4 : Machine Learning


In [6]:
#Exploring
print(type(var1))
type(var2)
type(var3)
type(var4)

<class 'int'>


str

In [21]:
#Exploring
int(var3)

1

In [24]:
type(var3)

bool

Conditional Statements

Python supports if-elif-else for writing conditional statements.  The condition should be terminated by: (colon) and the code block following that must be indented.  Indentation in Python is not optional. It is a syntax of Python. The conditional statement need not be  enclosed with a bracket. 


In [7]:
# Checking a condition if the variable value is more than 1
if var1 > 1:
    print( "Bigger than 1" )

Bigger than 1


In [8]:
x = 10
y = 12
# if x is greater than y
if x > y:
    print ("x > y")
# if x is lesser than y
elif x < y:
    print ("x < y")
else:
    print ("x = y")

x < y


Ternary operator enables to test a condition in a single line instead of the multi-line if-else. For example, assigning True or  False to variable based on condition check as below. 


In [9]:
# Initialize
x = 5
# Assign True if x is more than 10 or assign False using ternary operator
isGreater = True if x > 10 else False

In [10]:
isGreater

False

Generating Sequence Numbers

Sometimes it may be necessary to create a sequence of numbers. range() function generates a sequence  of numbers. It takes the following three parameters.  

1. start: Starting number of the sequence.  
2. stop: Generate numbers up to, but not including, this number.  
3. step: Difference between each number in the sequence. It is optional and default value is 1. 


In [11]:
# Initializing the sequence of numbers starting from 1
# and ending (not including) with 6
numbers = range( 1, 6 )
numbers

range(1, 6)

Control Flow Statements

A for loop can be used to print the sequence of numbers by iterating through it as follows. 

In [12]:
# Iterate through the collection
for i in numbers:
    print (i)

1
2
3
4
5


The for loop automatically advances the iterator through the range after every loop and completes after  the iterator reaches the end. The above loop prints values from 1 to 5 as numbers variable contains range  from 1 to 6 (not including 6).  A while loop is used to keep executing a loop until a condition is false. We can print a list of integer  from 1 to 5 using while loop as follows: 
 

In [13]:
# Initialize the value of 1
i = 1
# check the value of i to check if the loop will be continued or not
while i < 5:
    print(i)
    # Increment the value of i.
    i = i+1
# print after the value of i
print('Done')

1
2
3
4
Done


In while loop, the state of i has to be managed explicitly. There is a good chance of going into an infinite  loop if the state of i is not incremented. 

Functions

Functions are the most important part of a language.  

1. Functions can be created using def keyword.  

2. The function signature should contain the function name followed by the input parameters  enclosed in brackets and must end with a colon (:).  

3. Parameters are optional if initialized in the definition. The code block inside the method should  be indented. 

4. The function ends with a return statement. No return statement implies the function returns  None, which is same as void return type in languages such as C, C++, or Java.  

The data types of parameters and return types are inferred at runtime.  The following is an example of a function which takes two parameters and returns the addition of  their values using “+” operator. 

In [14]:
def addElements( a, b ):
    return a + b

In [15]:
result = addElements( 2, 3 )
result

5

In [16]:
result = addElements( 2.3, 4.5 )
result

6.8

In [17]:
result = addElements( "python", "workshop" )
result

'pythonworkshop'

In [18]:
def addElements( a, b = 4 ):
    return a + b

In [19]:
addElements( 2 )

6

In [20]:
addElements( 2, 5 )

7

It can be observed that the data type of parameters and return types are automatically determined based  on values passed to the function. If two strings are passed, they are concatenated as the + operator is also  overloaded for string concatenation.  The default value for the parameters can be defined in function signatures. This makes the  parameter optional.  

Example: Defining the method addElements() with parameter b initialized to 4. 


Working with Collections

Collections are useful containers or data structures to store and manipulate list of homogeneous or heterogeneous elements. We will discuss the following collections in this section:  

1. List  
2. Tuple  
3. Set  
4. Dictionary 


List

Lists are like arrays, but can contain heterogeneous items, that is, a single list can contain items of type  integer, float, string, or objects. 

It is also not a unique set of items, that is, the values can repeat. Lists are  mutable and generally initialized with a list of values specified inside square brackets or an empty list. 


In [22]:
## Create an empty list
emptyList = []

In [21]:
batsmen = ['Rohit', 'Dhawan', 'Kohli', 'Rahane', 'Rayudu', 'Dhoni']

The list index starts with 0. An item in the list can be accessed using index as follows: 

In [23]:
batsmen[0]

'Rohit'

A slice of the list can be obtained using an index range separated by a colon (:). 

A range [0:2] means starting with index 0 until index 2, but not including 2. 


In [24]:
## Slicing an list
batsmen[0:2]

['Rohit', 'Dhawan']

In [25]:
## Accessing the last element
batsmen[-1]

'Dhoni'

In [26]:
# how many elements in the list
len( batsmen )

6

In [27]:
bowlers = ['Bumrah', 'Shami', 'Bhuvi', 'Kuldeep', 'Chahal']

In [28]:
#Two separate lists can be concatenated into one list using + operator. 

all_players = batsmen + bowlers
all_players 

['Rohit',
 'Dhawan',
 'Kohli',
 'Rahane',
 'Rayudu',
 'Dhoni',
 'Bumrah',
 'Shami',
 'Bhuvi',
 'Kuldeep',
 'Chahal']

 Finding if an item exists in a list or not, the in operator can be used. It returns True if exists, else  returns False.

In [29]:
'Bumrah' in bowlers

True

In [30]:
'Rayudu' in bowlers

False

In [31]:
#Finding the index of an item in the list.

all_players.index( 'Dhoni' )

5

In [32]:
#The items in a list can be arranged in reverse order by calling reverse() function on the list. 

all_players.reverse()
all_players

['Chahal',
 'Kuldeep',
 'Bhuvi',
 'Shami',
 'Bumrah',
 'Dhoni',
 'Rayudu',
 'Rahane',
 'Kohli',
 'Dhawan',
 'Rohit']

Tuples

Tuple is also a list, but it is immutable. Once a tuple has been created it cannot be modified. For example, create a tuple which can contain the name of a cricketer and the year of his one-day international  (ODI) debut. 

In [33]:
odiDebut = ( 'Kohli', 2008 )

In [34]:
odiDebut

('Kohli', 2008)

In [35]:
#Tuple element’s index also starts with 0. 

odiDebut[0]

'Kohli'

In [36]:
#It is not allowed to change the tuple elements. For example, if we try to change the year in the tuple, it  will give an error. 

tup1[1] = 2009

NameError: name 'tup1' is not defined

In [37]:
#An existing list can be converted into tuple using tuple type cast. We convert the list all_players into  tuple so that it cannot be modified anymore. 

players = tuple( all_players )

In [38]:
players

('Chahal',
 'Kuldeep',
 'Bhuvi',
 'Shami',
 'Bumrah',
 'Dhoni',
 'Rayudu',
 'Rahane',
 'Kohli',
 'Dhawan',
 'Rohit')

# Set

A set is a collection of unique elements, that is, the values cannot repeat. A set can be initialized with a  list of items enclosed with curly brackets. 

In [39]:
setOfNumbers = {6,1,1,2,4,5}

The set automatically removes duplicates and contains only unique list of numbers. 

In [40]:
setOfNumbers

{1, 2, 4, 5, 6}

The set supports operations such as union, intersection, and difference.  To understand these operations, let us create two sets with list of batsmen who played for India in  2011 and 2015 world cup teams. 


In [41]:
wc2011 = {"Dhoni", "Sehwag", "Tendulkar", "Gambhir", "Kohli", "Raina", "Yuvraj","Yusuf"}
wc2015 = {"Dhoni", "Dhawan", "Rohit", "Rahane", "Kohli", "Raina", "Rayudu", "Jadeja"}

In [42]:
#To find the list of all batsmen who played in either 2011 or 2015 world cup, we can take union of the  above two sets. 
 
wc2011.union( wc2015 )

{'Dhawan',
 'Dhoni',
 'Gambhir',
 'Jadeja',
 'Kohli',
 'Rahane',
 'Raina',
 'Rayudu',
 'Rohit',
 'Sehwag',
 'Tendulkar',
 'Yusuf',
 'Yuvraj'}

In [43]:
#To find the list of all batsmen who played for both 2011 and 2015 world cup, we can take intersection of  these two sets wc2011 and wc2015. 

wc2011.intersection( wc2015 )

{'Dhoni', 'Kohli', 'Raina'}

In [44]:
#If we need to find the new batsmen who were not part of 2011 world cup and played in 2015 world cup,  we take difference of wc2015 from wc2011. 

wc2015.difference( wc2011 )

{'Dhawan', 'Jadeja', 'Rahane', 'Rayudu', 'Rohit'}

# Dictionary

Dictionary is a list of key and value pairs. All the keys in a dictionary are unique. 
For example, a dictionary that contains the ICC ODI World Cup winner from 1975 till 2011, where the key is year of tournament and the value is the name of the winning country.  

In [45]:
wcWinners = { 1975: "West Indies",
1979: "West Indies",
1983: "India",
1987: "Australia",
1991: "Pakistan",
1996: "Srilanka",
1999: "Australia",
2003: "Australia",
2007: "Australia",
2011: "India"}

The value of a specific dictionary element can be accessed by key. For example, to find the winning country in a specific year. 

In [46]:
wcWinners[1983]

'India'

In [47]:
#For a list of all winning countries use the following code: 
 
wcWinners.values()

{1975: 'West Indies',
 1979: 'West Indies',
 1983: 'India',
 1987: 'Australia',
 1991: 'Pakistan',
 1996: 'Srilanka',
 1999: 'Australia',
 2003: 'Australia',
 2007: 'Australia',
 2011: 'India'}

The above list had repeated names of certain countries as they have won multiple times. To find unique  list of countries, the above list can be converted to a set. 


In [69]:
set(wcWinners.values()) 

{'Australia', 'India', 'Pakistan', 'Srilanka', 'West Indies'}

To adding a new key-value pair to the dictionary, use the following code: 

In [72]:
wcWinners[2015] = 'Australia'
wcWinners

{1975: 'West Indies',
 1979: 'West Indies',
 1983: 'India',
 1987: 'Australia',
 1991: 'Pakistan',
 1996: 'Srilanka',
 1999: 'Australia',
 2003: 'Australia',
 2007: 'Australia',
 2011: 'India',
 2015: 'Australia'}

Dealing with Strings

A string in Python can be initialized with single or double quotes. 

In [48]:
string0 = 'python'
string1 = "machine learning"

In [49]:
#If multiline, the string can be initialized with triple quotes as below. 
string2 = """This is a
multiline string"""

In [50]:
# Converting to upper case
string0.upper()
# Similarly string.lower() can be used to convert to lower case.
# string0.lower()

'PYTHON'

For splitting the string into a list of words or tokens separated by space use the following: 

In [51]:
tokens = string1.split(' ')
tokens

['machine', 'learning']

# Functional Programming

Functional programming supports functions being passed as parameters to another function like variables. This allows to create higher order functions. One core benefits of functional programming in  data analysis is applying transformations or filters to a set of records or columns more efficiently than  using plain looping. 


Example 1: Map

Let us say we have a list named intList which contains integers as defined below. 

In [52]:
intList = [1,2,3,4,5,6,7,8,9]

We want to create another list named squareList, which contains the squared value of all elements in  intList. Typical approach to accomplish this is to write a for loop as below. 

In [53]:
# Create an empty list.
squareList = []
# Loop through the intList, square every item and append to result list squareList.
for x in intList:
    squareList.append( pow( x, 2 ) )

print( squareList )

[1, 4, 9, 16, 25, 36, 49, 64, 81]


The above code is quite verbose and not efficient. The loop transforms items in sequential fashion and  has no scope of parallel processing. Using functional programming approach, this can be written more  efficiently as described below in steps.  

Step 1: Define a function square_me() that takes an integer and returns the square value of it. 

In [74]:
def square_me( x ):
    return x * x

Step 2: The function square_me and the list of integers can be passed to a higher order function map().  map() iterates through the list and transforms each element using the function.  

In [55]:
squareList = map( square_me, intList)

In [56]:
list(squareList)

[1, 4, 9, 16, 25, 36, 49, 64, 81]

The square function square_me() we used is just one line of code and can actually be written as an  anonymous function. Anonymous function is a function without a name and is defined using lambda  keyword. 

In [57]:
#Write the above map using anonymous function. 
 
squareList = map(lambda x: x*x, intList)
list(squareList)

[1, 4, 9, 16, 25, 36, 49, 64, 81]

Example 2: Filter

Similarly, filters can also be applied using functional programming. For example, in case we want to  select only the even numbers from the numbers in the list intList. It will filter only those numbers which  are divisible by 2 and can be achieved using higher order function filter().  filter() takes a function as an argument, which should act like a filter and return True or False.  If returns False, the element will be filtered out. To verify if an integer is even, we can use filter  x% 2 == 0. 

In [58]:
evenInts = filter( lambda x : x % 2 == 0, intList )

In [59]:
list( evenInts )

[2, 4, 6, 8]

Functional programming is an important aspect of Python programming and will be used extensively  during data analysis. During data analysis, we will deal with mostly a collection of records. So, to accomplish tasks like transformations or filters, functional programming can be very handy in the place of  plain looping. 


# Modules and Packages

In Python, a module is a file that consists of functions, classes, and variables. A set of modules under a namespace (mostly a directory) is called a package. The modules and packages can be imported to another module using import statement. 

For example, to use mathematical functions, Python’s math  module can be imported. 

In [60]:
import math
## Taking square root of a value
math.sqrt(16)

4.0

Using from…import syntax, a specific module or object (e.g., class or function) can be imported from a  package. For example, to import only sample() function from random module the following import style  can be used. 

In [61]:
from random import sample

Another example is generating a random set of numbers in a range, that is, range(0, 10). random.sample()  takes the range function and number of random numbers to be generated as parameters. The code below  helps to generate 3 random numbers between 0 and 10. 

In [62]:
sample( range(0, 11), 3)

[3, 1, 7]

Other Features

It may be necessary to return multiple values from a function. This can be achieved by returning a tuple.  This is an important feature we will be using in the subsequent chapters.  

For example, define a function that returns the mean and the median of a list of numbers generated  randomly. 


In [63]:
import random
randomList = random.sample( range(0, 100), 20)
randomList

[72, 3, 65, 1, 51, 69, 74, 99, 16, 53, 68, 27, 95, 81, 56, 64, 67, 32, 19, 96]

In [64]:
from statistics import mean, median
def getMeanAndMedian( listNum ):
    return mean(listNum), median(listNum)

getMeanAndMedian() returns a tuple with two elements and is stored into two separate variables during  invocation. 


In [65]:
mean, median = getMeanAndMedian( randomList )

In [66]:
print( "Mean: ", mean, " Median: ", median)

Mean:  55.4  Median:  64.5


In [26]:
lis = [10 , [-1,2,3,4,5]]

In [28]:
type(lis)

list

In [34]:
x = [10,20,0,4,5]

In [41]:
y = bytes(x)

In [42]:
print(y)

b'\n\x14\x00\x04\x05'


In [43]:
type(y)

bytes

In [45]:
type(x)Lst = [ 10, -20, 15.5, ‘Sravan’, “Roger”]

list

In [48]:
for y in y: print(i)

TypeError: 'int' object is not iterable

In [56]:
tpl = ( 10, -20, 15.5, 'Sravan', "Roger")

In [57]:
type(tpl)

tuple

In [59]:
Lst(0) = 100

SyntaxError: can't assign to function call (<ipython-input-59-96024a1d24a8>, line 1)

In [61]:
Lst[0] = 100

In [63]:
print(Lst)

[100, -20, 15.5, 'Sravan', 'Roger']


In [68]:
type(Lst * 2)

list

In [70]:
Lst1 = ['Roger', 'Rafa', 'Novak', 'Nick']

In [72]:
Lst1

['Roger', 'Rafa', 'Novak', 'Nick']

In [74]:
Lst2 = Lst + Lst1

In [76]:
print(Lst2)

[100, -20, 15.5, 'Sravan', 'Roger', 'Roger', 'Rafa', 'Novak', 'Nick']


In [78]:
'Roger' in Lst2

True

In [86]:
s = { 10, 20, 20, 30, 50 , "Roger" }

In [87]:
print(s)

{10, 'Roger', 50, 20, 30}


In [88]:
type(s)

set

In [89]:
s = set

In [91]:
ch = set("Hello") 

In [93]:
print(ch)

{'o', 'l', 'e', 'H'}


In [95]:
ch1 = ("Hello")


In [96]:
print(ch1)

Hello


In [98]:
print(s[0])

TypeError: 'type' object is not subscriptable

In [119]:
ch.update((50,60))

In [120]:
print(ch)

{'o', 'H', 'e', 50, 'l', 60}


In [121]:
type(ch)

set

In [113]:
ch.remove(50)

In [122]:
print(ch)

{'o', 'H', 'e', 50, 'l', 60}


In [128]:
ch1 = { 10,20,30,50,40}

In [129]:
print(ch1)

{40, 10, 50, 20, 30}


In [131]:
ch. (ch1)

SyntaxError: invalid syntax (<ipython-input-131-402d9ed88cd9>, line 1)

In [136]:
ch. difference(ch1)

{60, 'H', 'e', 'l', 'o'}

In [138]:
d = {14 : 'Sravan' , 15 : 'Srihari', 21 : 'Vidyasagar'}

In [140]:
type(d)

dict