# Data Analytics and Visualization with Python

### Learning Objective - 

- Introduction to Analytics using Python
    - Python Basics for Analytics (Revision)
    - numpy and pandas library
    - Reading data from various sources (excel, csv, database, json)
    - Cleaning and Preparing Data
- Descriptive Statistics
- Visualizing Data
    - Introduction to matplotlib library
    - Anatomy of a figure
    - Creating sub-plots
    - Chart aesthetics
- Visual Data Analytics
    - Univariate Analysis
        - count plots
        - histograms and boxplot
    - Bivariate Analysis
        - scatter plot
        - bar plot
        - line charts
        - pair plots, heatmaps

## Python Basics for Analytics 

#### Built-in data structure - 
- list  - [], mutable, mixed data, indexed
- tuples - (), immutable, mixed data, indexed
- set  - {}, mutable, no duplicates, unordered, mixed data but only immutable objects
- dict - {}, mutable, key:value, key - no duplicates, immutable object; Value - any type

#### Python Functions

- sorted(), zip(), enumerate(), lambda functions

###### Ex. WAP to print sum of numbers 1 - 10

In [1]:
sum(range(1, 11))

55

In [2]:
import math
math.prod(range(1, 11))

3628800

###### Ex. WAP to sort the given list in DESC order

In [4]:
numbers = [1, 3, 4, 2, 5]
numbers.sort(reverse=True)
numbers

[5, 4, 3, 2, 1]

In [6]:
numbers = (1, 3, 4, 2, 5)
sorted(numbers,reverse=True)  # sorted - sorts any sequence and retuns a list object

[5, 4, 3, 2, 1]

###### Ex. WAP to replace all vowels in a string with "*"

In [12]:
word = input("Enter a word - ")
for ch in "aeiouAEIOU" :
    word = word.replace(ch, "*")
word

Enter a word -  SINGAPORE


'S*NG*P*R*'

In [18]:
word = input("Enter a word - ")
trans_obj = str.maketrans("aeiou", "@3!0_")
word.translate(trans_obj)

Enter a word -  new delhi


'n3w d3lh!'

In [17]:
print("-"*50)

--------------------------------------------------


###### Ex. WAP to swap first and last character of a word

In [22]:
word = input("Enter a word - ") # i/p - "mumbai"  o/p - "iumbam"
word[-1] + word[1 : -1] + word[0]

Enter a word -  mumbai


'iumbam'

#### Ex. Calculate Gross Pay
- Take hours worked and rate per hour as input from the user.
- If the hours worked are 40 or less, apply the given rate.
- If the hours worked exceed 40, apply the given rate for the first 40 hours and 1.5 times the rate for the additional hours as overtime pay.

In [26]:
hours = int(input("Enter number of hours - "))
rate = int(input("Enter rate per hour - "))

if hours <= 40 :
    gross_pay = hours * rate
else:
    gross_pay = (40 * rate) + (hours - 40) * 1.5 * rate
gross_pay

Enter number of hours -  45
Enter rate per hour -  100


4750.0

###### Ex. WAP to generate a list of squares of number in range of 1-10

In [27]:
[i**2 for i in range(1, 11)]

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

###### Ex. WAP to generate a list of squares of numbers divisible by 3 in range of 1-20

In [28]:
[i**2 for i in range(1, 21) if i % 3 == 0]

[9, 36, 81, 144, 225, 324]

###### Ex. WAP to create dict of numbers in range 1-10 as keys and their squares as values

In [29]:
{i : i**2 for i in range(1, 11)}

{1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81, 10: 100}

###### Ex. WAP to create a dict of numbers divisible by 3 in range 1-20 as keys and their type(even or odd) as values

In [30]:
{i : "even" if i % 2 == 0 else "odd" for i in range(1, 21) if i % 3 == 0}

{3: 'odd', 6: 'even', 9: 'odd', 12: 'even', 15: 'odd', 18: 'even'}

`Comprehensions` are an elegant way to define and create mutable data structures like lists, sets, dictionary based on existing sequences
Syntax – 

`[<expression> for <var> in <sequence> if <condition>]`

1. Identify the sequence
2. Identify condition if any
3. Expression
4. Mutable datastructure

###### Ex. WAP to add 7% service tax to all the values in the "sales" list

In [31]:
sales = [290, 500, 800, 650]
[i * 1.07 for i in sales]

[310.3, 535.0, 856.0, 695.5]

###### Ex. WAP to sum all the values in the "sales" tuple

In [34]:
sales = ("$290", "$500", "$800", "$650")
sum([int(i.replace("$", "")) for i in sales])

2240

In [35]:
sales = ("$290", "$500", "$800", "$650")
sum([int(i.strip("$")) for i in sales])

2240

In [38]:
profits = ("-$290", "$500", "$800", "-$650")
sum([int(i.replace("$", "")) for i in profits])

360

## Functions in Python

#### function definition

In [43]:
def factorial(num) :
    if type(num) == int :
        fact = 1
        for i in range(num, 1, -1):
            fact *= i
        return fact
    else:
        return "Invalid"

#### function call

In [44]:
factorial(5)

120

In [45]:
factorial("abcd")

'Invalid'

In [75]:
def multiply_by_10(num) :
    return num * 10

print(multiply_by_10(5))
print(multiply_by_10("5"))

50
5555555555


##### Note - Unpacking of tuples

In [47]:
tup = 1, 2, 3  # packing of tuples
tup

(1, 2, 3)

In [49]:
a, b, c = tup  # unpacking of tuples
print(a, b, c)

1 2 3


Defining multiple variables in a single line

In [50]:
name, age = "Jane", 30
name

'Jane'

Function returning multiple values

In [52]:
def calculate(num):
    return num**2, num**3
# This function returns multiple values in a tuple object

In [53]:
values = calculate(2)
values

(4, 8)

In [54]:
sq, cub = calculate(2)  # unpacking of tuples

In [55]:
sq

4

In [57]:
cub

8

Using unpacking of tuples in a for-loop

In [60]:
emp = {'Jane': 30, 'Jack': 20, 'Rosie': 25}
for i in emp :
    print(i, " - ", emp[i])

Jane  -  30
Jack  -  20
Rosie  -  25


In [61]:
emp = {'Jane': 30, 'Jack': 20, 'Rosie': 25}
for i in emp.items() :
    print(i)

('Jane', 30)
('Jack', 20)
('Rosie', 25)


In [62]:
emp.items()

dict_items([('Jane', 30), ('Jack', 20), ('Rosie', 25)])

In [63]:
emp = {'Jane': 30, 'Jack': 20, 'Rosie': 25}
for i, j in emp.items() :
    print(i, " - ", j)

Jane  -  30
Jack  -  20
Rosie  -  25


### Function Arguments

#### Required Positional Arguments

In [64]:
def demo(name, age) :
    print(f"Name - {name} | Age - {age}")

In [67]:
demo("Jane", 30)
demo(30, "Jane")
demo("Jane")

Name - Jane | Age - 30
Name - 30 | Age - Jane


TypeError: demo() missing 1 required positional argument: 'age'

Examples - 

In [69]:
strg = "mississippi"
print(strg.replace("i", "*"))
print(strg.replace("*", "i"))

m*ss*ss*pp*
mississippi


In [71]:
lst = [10, 20, 30, 40, 50]
lst.insert(2, "abc")
# lst.insert("abc", 2)  - error - positional argument
lst

[10, 20, 'abc', 30, 40, 50]

In [73]:
lst = [10, 20, 30, 40, 50]
# lst.insert(2, 3)
lst.insert(3, 2)
lst

[10, 20, 30, 2, 40, 50]

In [79]:
list(range(1, 11, 2))

[1, 3, 5, 7, 9]

In [77]:
list(range(11))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [78]:
help(range)

Help on class range in module builtins:

class range(object)
 |  range(stop) -> range object
 |  range(start, stop[, step]) -> range object
 |
 |  Return an object that produces a sequence of integers from start (inclusive)
 |  to stop (exclusive) by step.  range(i, j) produces i, i+1, i+2, ..., j-1.
 |  start defaults to 0, and stop is omitted!  range(4) produces 0, 1, 2, 3.
 |  These are exactly the valid indices for a list of 4 elements.
 |  When step is given, it specifies the increment (or decrement).
 |
 |  Methods defined here:
 |
 |  __bool__(self, /)
 |      True if self else False
 |
 |  __contains__(self, key, /)
 |      Return bool(key in self).
 |
 |  __eq__(self, value, /)
 |      Return self==value.
 |
 |  __ge__(self, value, /)
 |      Return self>=value.
 |
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |
 |  __getitem__(self, key, /)
 |      Return self[key].
 |
 |  __gt__(self, value, /)
 |      Return self>value.
 |
 |  __hash__(self, /)
 |

In [80]:
help(list.insert)

Help on method_descriptor:

insert(self, index, object, /) unbound builtins.list method
    Insert object before index.



#### Default Argument

In [81]:
help(str.replace)

Help on method_descriptor:

replace(self, old, new, count=-1, /) unbound builtins.str method
    Return a copy with all occurrences of substring old replaced by new.

      count
        Maximum number of occurrences to replace.
        -1 (the default value) means replace all occurrences.

    If the optional argument count is given, only the first count occurrences are
    replaced.



In [83]:
strg = "mississippi"
print(strg.replace("i", "*"))
print(strg.replace("i", "*", 2))

m*ss*ss*pp*
m*ss*ssippi


In [84]:
help(sorted)

Help on built-in function sorted in module builtins:

sorted(iterable, /, *, key=None, reverse=False)
    Return a new list containing all items from the iterable in ascending order.

    A custom key function can be supplied to customize the sort order, and the
    reverse flag can be set to request the result in descending order.



In [85]:
def demo(name, age = 30) :
    print(f"Name - {name} | Age - {age}")

In [90]:
demo("Jane", 25)
demo("Jane")
demo(25, "Jane")
demo()

Name - Jane | Age - 25
Name - Jane | Age - 30
Name - 25 | Age - Jane


TypeError: demo() missing 1 required positional argument: 'name'

#### Variable Length Argument

In [91]:
def demo(name, *args, age = 18) :
    print(f"Name - {name} | Age - {age} | marks - {args}")

In [92]:
demo("Jane", 50, 60, 70, 80, 90, 20)

Name - Jane | Age - 18 | marks - (50, 60, 70, 80, 90, 20)


#### Key-word Arugment

In [94]:
demo("Jane", 50, 60, 70, 80, 90, age = 20)

Name - Jane | Age - 20 | marks - (50, 60, 70, 80, 90)


#### Variable length key-word Argument

In [98]:
def demo(name, *args, age = 18, **kwargs) :
    print(f"Name - {name} | Age - {age} | marks - {args} | Additional details - {kwargs}")

In [99]:
demo("Jane", 50, 60, 70, 80, 90, age = 20, mob = 98765443, gender = "F")

Name - Jane | Age - 20 | marks - (50, 60, 70, 80, 90) | Additional details - {'mob': 98765443, 'gender': 'F'}


#### Significance of `/` and `*`

- **`*`** - All arguments after `*` must be key-word arguments
- **`/`** - All arguments before `/` must be positional-only arguments

In [101]:
def demo(name, age) :
    print(f"Name - {name} | Age - {age}")

demo("Jane", 30)
demo("Jane", age = 30)
demo(age = 30, name = "Jane")

Name - Jane | Age - 30
Name - Jane | Age - 30
Name - Jane | Age - 30


In [103]:
def demo(name, age, /) :
    print(f"Name - {name} | Age - {age}")

demo("Jane", 30)
# demo("Jane", age = 30)   # error
# demo(age = 30, name = "Jane")  # error

Name - Jane | Age - 30


In [104]:
def demo(name, /, age) :
    print(f"Name - {name} | Age - {age}")

demo("Jane", 30)
demo("Jane", age = 30)
demo(age = 30, name = "Jane")  # error

Name - Jane | Age - 30
Name - Jane | Age - 30


TypeError: demo() got some positional-only arguments passed as keyword arguments: 'name'

In [106]:
def demo(name, *, age) :
    print(f"Name - {name} | Age - {age}")

# demo("Jane", 30) # error
demo("Jane", age = 30)
demo(age = 30, name = "Jane") 

Name - Jane | Age - 30
Name - Jane | Age - 30


###### Problem Statement - Store all the details of employees in the list to a file.

In [136]:
def write_to_file(ecode, name, salary):
    with open("emp_details.txt", "a") as file :
        file.write(f"{ecode},{name},{salary}\n")
    print(f"Details of employee {name} added to file")
write_to_file(1, "Jack", 50000)

Details of employee Jack added to file


In [135]:
emps = [
(101, 'Jane', 70000),
(102, 'Rosie', 90000),
(103, 'Mary', 40000),
(104, 'Sam', 55000),
 ]

In [138]:
for e in emps :
    write_to_file(*e)

Details of employee Jane added to file
Details of employee Rosie added to file
Details of employee Mary added to file
Details of employee Sam added to file
Details of employee George added to file


In [139]:
data = {"ecode" :101, "name" : 'Jane', "salary" : 70000}
write_to_file(**data)

Details of employee Jane added to file


## Lambda Functions

###### Ex. WAP to define a lambda function to add 2 numbers

In [108]:
add = lambda num1, num2 : num1 + num2

add(2, 3)

5

###### WALF to retun square of a number

In [109]:
square = lambda num : num ** 2
square(5)

25

## Application of Function Objects

In [110]:
def func(a, b) :
    if a < b :
        return a
    else : 
        return b

In [111]:
var = func(3, 4)
var

3

In [112]:
var = func
var

<function __main__.func(a, b)>

In [113]:
var = len
var

<function len(obj, /)>

In [114]:
var("abcd")

4

###### Ex. WAP to sort the given list

In [115]:
lst = ["flight", "bike", "train", "car"]
sorted(lst) # sorts alphabetically

['bike', 'car', 'flight', 'train']

In [117]:
lst = ["flight", "bike", "train", "car"]
sorted(lst, key = lambda strg : strg[-1]) # sorts as per the last character of each word

['bike', 'train', 'car', 'flight']

In [118]:
lst = ["flight", "bike", "train", "car"]
sorted(lst, key = len)  # sorts by num of characters in each word

['car', 'bike', 'train', 'flight']

###### Ex. WAP to display name and age of the employees in ASC order of their ages

In [122]:
emp = {'Jane': 30, 'Jack': 20, 'Rosie': 25}
dict(sorted(emp.items(), key = lambda tup : tup[1]))

{'Jack': 20, 'Rosie': 25, 'Jane': 30}

###### Ex. WAP to create a dict of names as keys and salaries as values

In [124]:
names = ['Jane', 'Rosie', 'Mary', 'Sam', 'George']
salary = [70000, 90000, 40000, 55000, 76000]
dict(zip(names, salary))

{'Jane': 70000, 'Rosie': 90000, 'Mary': 40000, 'Sam': 55000, 'George': 76000}

###### Ex. WAP to create a dict where keys are emp code starting from 101... and values are tuples of (name, salary)

In [128]:
print(dict(enumerate(zip(names, salary), start = 101)))

{101: ('Jane', 70000), 102: ('Rosie', 90000), 103: ('Mary', 40000), 104: ('Sam', 55000), 105: ('George', 76000)}


In [143]:
lst = ["flight", "bike", "train", "car"]
max(lst)

'train'

In [141]:
max(lst, key = len)

'flight'

In [144]:
max(lst, key = min)

'flight'

## Working on Arrays

In [None]:
!pip install numpy  # Install only if np is not present

In [145]:
import numpy as np

In [146]:
names = np.array(["Olivia", "Liam", "Emma", "Noah", "Ava", "Sophia", "Jackson", "Isabella", "Lucas", "Mia"])
maths = np.array([93, 60, 68, 53, 63, 30, 46, 63, 66, 53])
english = np.array([75, 69, 78, 66, 53, 26, 65, 62, 63, 70])
science = np.array([96, 57, 55, 52, 52, 31, 96, 58, 52, 70])

#### Array Attributes

###### How many students appreared for the exam?

In [147]:
names.size

10

In [148]:
names.dtype

dtype('<U8')

In [149]:
maths.dtype

dtype('int64')

In [150]:
names.ndim  # dimensions of array

1

#### Accessing Array elemenets and Operations on Arrays

###### Ex. Who scored maximum marks in science?

In [156]:
int(science.max())

96

In [157]:
science.argmax()  # Returns the index position of largest element

np.int64(0)

In [158]:
science == science.max()  # Returns a bool array after comparing based on condition

array([ True, False, False, False, False, False,  True, False, False,
       False])

In [159]:
science[science == science.max()]

array([96, 96])

In [160]:
names[science == science.max()]

array(['Olivia', 'Jackson'], dtype='<U8')

###### Ex. How many students have passed in maths?

In [161]:
names[maths >= 35]

array(['Olivia', 'Liam', 'Emma', 'Noah', 'Ava', 'Jackson', 'Isabella',
       'Lucas', 'Mia'], dtype='<U8')

In [162]:
names[maths >= 35].size

9

In [164]:
sum(maths >= 35)

np.int64(9)

###### Ex. Are there any students who have failed in maths? (True or False)

In [169]:
np.all(maths >= 35)   # returns True if all values are True else False

np.False_

In [173]:
np.any(english < 35)   # returns True if any one value is True else Falseb

np.True_

In [174]:
lst = [True, False, True, ()]
all(lst)

False

In [175]:
lst = [True, True, True, ()]  # bool of empty tuple is always False
all(lst)

False

###### Ex. Have all students cleared their math exams (True or False)

In [176]:
np.all(maths >= 35)

np.False_

###### Ex. Who failed in maths? passing marks - 35

In [187]:
names[maths< 35]

array(['Sophia'], dtype='<U8')

In [190]:
", ".join(names[maths > 35])

'Olivia, Liam, Emma, Noah, Ava, Jackson, Isabella, Lucas, Mia'

<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>

###### Ex. Calculate percentage of all students and assign grades

###### Assign grades to the students (Failed or pass)

###### Assigning grades as A, B, C, D

###### Ex. Display names of students who have scored above class average.

### Working on Dataframes

#### Reading data from various sources (excel/csv, database, json)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

#### Creating a Datafram from lists/arrays

In [None]:
names = np.array(["Olivia", "Liam", "Emma", "Noah", "Ava", "Sophia", "Jackson", "Isabella", "Lucas", "Mia"])
maths = np.array([93, 60, 68, 53, 63, 30, 46, 63, 66, 53])
english = np.array([75, 69, 78, 66, 53, 26, 65, 62, 63, 70])
science = np.array([96, 57, 55, 52, 52, 31, 96, 58, 52, 70])


###### Ex. Create new columns as Total Marks, Percentage, Rank and Grade

###### Ex. What is AVG marks scored by students in Maths getting Grade B 

### Conneting to database

In [None]:
!pip install sqlalchemy

In [None]:
from sqlalchemy import create_engine
conn = create_engine("sqlite:///employee.sqlite3")
conn

In [None]:
mssql://*server_name*/*database_name*?trusted_connection=yes

### Connect to Json Object

### Read data from CSV

Method 1 - Set the current working directory as the file path

Method 2 - Upload the files to working environment using jupyter upload button

#### Reading data from csv file

#### Handling null values

#### Replacing nulls

#### Remove nulls

#### Cleaning and Preparing Data

###### Ex. Find total avg sales (HINT - use mean() in Sales column)

###### Converting date field

###### Ex. Create column for Target Status

###### Ex. Visualise Target status on a bar chart

###### Ex. Visualise product-wise Sales

###### Ex. Display product-wise total sales across state Manipur in DESC Order. Find the product generating maximum sales.