# Data Analytics and Visualization with Python

### Learning Objective - 

- Introduction to Analytics using Python
    - Python Basics for Analytics (Revision)
    - numpy and pandas library
    - Reading data from various sources (excel, csv, database, json)
    - Cleaning and Preparing Data
- Descriptive Statistics
- Visualizing Data
    - Introduction to matplotlib library
    - Anatomy of a figure
    - Creating sub-plots
    - Chart aesthetics
- Visual Data Analytics
    - Univariate Analysis
        - count plots
        - histograms and boxplot
    - Bivariate Analysis
        - scatter plot
        - bar plot
        - line charts
        - pair plots, heatmaps

## Python Basics for Analytics 

#### Built-in data structure - 
- list  - [], mutable, mixed data, indexed
- tuples - (), immutable, mixed data, indexed
- set  - {}, mutable, no duplicates, unordered, mixed data but only immutable objects
- dict - {}, mutable, key:value, key - no duplicates, immutable object; Value - any type

#### Python Functions

- sorted(), zip(), enumerate(), lambda functions

###### Ex. WAP to print sum of numbers 1 - 10

In [None]:
sum(range(1, 11))

In [None]:
import math
math.prod(range(1, 11))

###### Ex. WAP to sort the given list in DESC order

In [None]:
numbers = [1, 3, 4, 2, 5]
numbers.sort(reverse=True)
numbers

In [None]:
numbers = (1, 3, 4, 2, 5)
sorted(numbers,reverse=True)  # sorted - sorts any sequence and retuns a list object

###### Ex. WAP to replace all vowels in a string with "*"

In [None]:
word = input("Enter a word - ")
for ch in "aeiouAEIOU" :
    word = word.replace(ch, "*")
word

In [None]:
word = input("Enter a word - ")
trans_obj = str.maketrans("aeiou", "@3!0_")
word.translate(trans_obj)

In [None]:
print("-"*50)

###### Ex. WAP to swap first and last character of a word

In [None]:
word = input("Enter a word - ") # i/p - "mumbai"  o/p - "iumbam"
word[-1] + word[1 : -1] + word[0]

#### Ex. Calculate Gross Pay
- Take hours worked and rate per hour as input from the user.
- If the hours worked are 40 or less, apply the given rate.
- If the hours worked exceed 40, apply the given rate for the first 40 hours and 1.5 times the rate for the additional hours as overtime pay.

In [None]:
hours = int(input("Enter number of hours - "))
rate = int(input("Enter rate per hour - "))

if hours <= 40 :
    gross_pay = hours * rate
else:
    gross_pay = (40 * rate) + (hours - 40) * 1.5 * rate
gross_pay

###### Ex. WAP to generate a list of squares of number in range of 1-10

In [None]:
[i**2 for i in range(1, 11)]

###### Ex. WAP to generate a list of squares of numbers divisible by 3 in range of 1-20

In [None]:
[i**2 for i in range(1, 21) if i % 3 == 0]

###### Ex. WAP to create dict of numbers in range 1-10 as keys and their squares as values

In [None]:
{i : i**2 for i in range(1, 11)}

###### Ex. WAP to create a dict of numbers divisible by 3 in range 1-20 as keys and their type(even or odd) as values

In [None]:
{i : "even" if i % 2 == 0 else "odd" for i in range(1, 21) if i % 3 == 0}

`Comprehensions` are an elegant way to define and create mutable data structures like lists, sets, dictionary based on existing sequences
Syntax â€“ 

`[<expression> for <var> in <sequence> if <condition>]`

1. Identify the sequence
2. Identify condition if any
3. Expression
4. Mutable datastructure

###### Ex. WAP to add 7% service tax to all the values in the "sales" list

In [None]:
sales = [290, 500, 800, 650]
[i * 1.07 for i in sales]

###### Ex. WAP to sum all the values in the "sales" tuple

In [None]:
sales = ("$290", "$500", "$800", "$650")
sum([int(i.replace("$", "")) for i in sales])

In [None]:
sales = ("$290", "$500", "$800", "$650")
sum([int(i.strip("$")) for i in sales])

In [None]:
profits = ("-$290", "$500", "$800", "-$650")
sum([int(i.replace("$", "")) for i in profits])

## Functions in Python

#### function definition

In [None]:
def factorial(num) :
    if type(num) == int :
        fact = 1
        for i in range(num, 1, -1):
            fact *= i
        return fact
    else:
        return "Invalid"

#### function call

In [None]:
factorial(5)

In [None]:
factorial("abcd")

In [None]:
def multiply_by_10(num) :
    return num * 10

print(multiply_by_10(5))
print(multiply_by_10("5"))

##### Note - Unpacking of tuples

In [None]:
tup = 1, 2, 3  # packing of tuples
tup

In [None]:
a, b, c = tup  # unpacking of tuples
print(a, b, c)

Defining multiple variables in a single line

In [None]:
name, age = "Jane", 30
name

Function returning multiple values

In [None]:
def calculate(num):
    return num**2, num**3
# This function returns multiple values in a tuple object

In [None]:
values = calculate(2)
values

In [None]:
sq, cub = calculate(2)  # unpacking of tuples

In [None]:
sq

In [None]:
cub

Using unpacking of tuples in a for-loop

In [None]:
emp = {'Jane': 30, 'Jack': 20, 'Rosie': 25}
for i in emp :
    print(i, " - ", emp[i])

In [None]:
emp = {'Jane': 30, 'Jack': 20, 'Rosie': 25}
for i in emp.items() :
    print(i)

In [None]:
emp.items()

In [None]:
emp = {'Jane': 30, 'Jack': 20, 'Rosie': 25}
for i, j in emp.items() :
    print(i, " - ", j)

### Function Arguments

#### Required Positional Arguments

In [None]:
def demo(name, age) :
    print(f"Name - {name} | Age - {age}")

In [None]:
demo("Jane", 30)
demo(30, "Jane")
demo("Jane")

Examples - 

In [None]:
strg = "mississippi"
print(strg.replace("i", "*"))
print(strg.replace("*", "i"))

In [None]:
lst = [10, 20, 30, 40, 50]
lst.insert(2, "abc")
# lst.insert("abc", 2)  - error - positional argument
lst

In [None]:
lst = [10, 20, 30, 40, 50]
# lst.insert(2, 3)
lst.insert(3, 2)
lst

In [None]:
list(range(1, 11, 2))

In [None]:
list(range(11))

In [None]:
help(range)

In [None]:
help(list.insert)

#### Default Argument

In [None]:
help(str.replace)

In [None]:
strg = "mississippi"
print(strg.replace("i", "*"))
print(strg.replace("i", "*", 2))

In [None]:
help(sorted)

In [None]:
def demo(name, age = 30) :
    print(f"Name - {name} | Age - {age}")

In [None]:
demo("Jane", 25)
demo("Jane")
demo(25, "Jane")
demo()

#### Variable Length Argument

In [None]:
def demo(name, *args, age = 18) :
    print(f"Name - {name} | Age - {age} | marks - {args}")

In [None]:
demo("Jane", 50, 60, 70, 80, 90, 20)

#### Key-word Arugment

In [None]:
demo("Jane", 50, 60, 70, 80, 90, age = 20)

#### Variable length key-word Argument

In [None]:
def demo(name, *args, age = 18, **kwargs) :
    print(f"Name - {name} | Age - {age} | marks - {args} | Additional details - {kwargs}")

In [None]:
demo("Jane", 50, 60, 70, 80, 90, age = 20, mob = 98765443, gender = "F")

#### Significance of `/` and `*`

- **`*`** - All arguments after `*` must be key-word arguments
- **`/`** - All arguments before `/` must be positional-only arguments

In [None]:
def demo(name, age) :
    print(f"Name - {name} | Age - {age}")

demo("Jane", 30)
demo("Jane", age = 30)
demo(age = 30, name = "Jane")

In [None]:
def demo(name, age, /) :
    print(f"Name - {name} | Age - {age}")

demo("Jane", 30)
# demo("Jane", age = 30)   # error
# demo(age = 30, name = "Jane")  # error

In [None]:
def demo(name, /, age) :
    print(f"Name - {name} | Age - {age}")

demo("Jane", 30)
demo("Jane", age = 30)
demo(age = 30, name = "Jane")  # error

In [None]:
def demo(name, *, age) :
    print(f"Name - {name} | Age - {age}")

# demo("Jane", 30) # error
demo("Jane", age = 30)
demo(age = 30, name = "Jane") 

###### Problem Statement - Store all the details of employees in the list to a file.

In [None]:
def write_to_file(ecode, name, salary):
    with open("emp_details.txt", "a") as file :
        file.write(f"{ecode},{name},{salary}\n")
    print(f"Details of employee {name} added to file")
write_to_file(1, "Jack", 50000)

In [None]:
emps = [
(101, 'Jane', 70000),
(102, 'Rosie', 90000),
(103, 'Mary', 40000),
(104, 'Sam', 55000),
 ]

In [None]:
for e in emps :
    write_to_file(*e)

In [None]:
data = {"ecode" :101, "name" : 'Jane', "salary" : 70000}
write_to_file(**data)

## Lambda Functions

###### Ex. WAP to define a lambda function to add 2 numbers

In [None]:
add = lambda num1, num2 : num1 + num2

add(2, 3)

###### WALF to retun square of a number

In [None]:
square = lambda num : num ** 2
square(5)

## Application of Function Objects

In [None]:
def func(a, b) :
    if a < b :
        return a
    else : 
        return b

In [None]:
var = func(3, 4)
var

In [None]:
var = func
var

In [None]:
var = len
var

In [None]:
var("abcd")

###### Ex. WAP to sort the given list

In [None]:
lst = ["flight", "bike", "train", "car"]
sorted(lst) # sorts alphabetically

In [None]:
lst = ["flight", "bike", "train", "car"]
sorted(lst, key = lambda strg : strg[-1]) # sorts as per the last character of each word

In [None]:
lst = ["flight", "bike", "train", "car"]
sorted(lst, key = len)  # sorts by num of characters in each word

###### Ex. WAP to display name and age of the employees in ASC order of their ages

In [None]:
emp = {'Jane': 30, 'Jack': 20, 'Rosie': 25}
dict(sorted(emp.items(), key = lambda tup : tup[1]))

###### Ex. WAP to create a dict of names as keys and salaries as values

In [None]:
names = ['Jane', 'Rosie', 'Mary', 'Sam', 'George']
salary = [70000, 90000, 40000, 55000, 76000]
dict(zip(names, salary))

###### Ex. WAP to create a dict where keys are emp code starting from 101... and values are tuples of (name, salary)

In [None]:
print(dict(enumerate(zip(names, salary), start = 101)))

In [None]:
lst = ["flight", "bike", "train", "car"]
max(lst)

In [None]:
max(lst, key = len)

In [None]:
max(lst, key = min)

## Working on Arrays

In [None]:
!pip install numpy  # Install only if np is not present

In [None]:
import numpy as np

In [None]:
names = np.array(["Olivia", "Liam", "Emma", "Noah", "Ava", "Sophia", "Jackson", "Isabella", "Lucas", "Mia"])
maths = np.array([93, 60, 68, 53, 63, 30, 46, 63, 66, 53])
english = np.array([75, 69, 78, 66, 53, 26, 65, 62, 63, 70])
science = np.array([96, 57, 55, 52, 52, 31, 96, 58, 52, 70])

#### Array Attributes

###### How many students appreared for the exam?

In [None]:
names.size

In [None]:
names.dtype

In [None]:
maths.dtype

In [None]:
names.ndim  # dimensions of array

#### Accessing Array elemenets and Operations on Arrays

###### Ex. Who scored maximum marks in science?

In [None]:
int(science.max())

In [None]:
science.argmax()  # Returns the index position of largest element

In [None]:
science == science.max()  # Returns a bool array after comparing based on condition

In [None]:
science[science == science.max()]

In [None]:
names[science == science.max()]

###### Ex. How many students have passed in maths?

In [None]:
names[maths >= 35]

In [None]:
names[maths >= 35].size

In [None]:
sum(maths >= 35)

###### Ex. Are there any students who have failed in maths? (True or False)

In [None]:
np.all(maths >= 35)   # returns True if all values are True else False

In [None]:
np.any(english < 35)   # returns True if any one value is True else Falseb

In [None]:
lst = [True, False, True, ()]
all(lst)

In [None]:
lst = [True, True, True, ()]  # bool of empty tuple is always False
all(lst)

###### Ex. Have all students cleared their math exams (True or False)

In [None]:
np.all(maths >= 35)

###### Ex. Who failed in maths? passing marks - 35

In [None]:
names[maths< 35]

In [None]:
", ".join(names[maths > 35])

###### Ex. Calculate percentage of all students and assign grades

In [None]:
percentage = np.round((maths + english + science)/3, 2)
percentage

###### Assign grades to the students (Failed or pass)

In [None]:
grades_1 = np.where(percentage >= 35, "Passed", "Failed")
grades_1

###### Assigning grades as A, B, C, D

In [None]:
conditions = [percentage >= 75, percentage >= 60, percentage >= 40]
results = ["Grade A", "Grade B", "Grade C"]
np.select(conditions, results, "Grade D")

In [None]:
help(np.select)

###### Ex. Display names of students who have scored above class average.

In [None]:
names[percentage >= percentage.mean()]

In [None]:
names[maths >= maths.mean()]

###### Ex. How many students are obve average in class also above avg in all three subject

In [None]:
above_class_avg = names[percentage >= percentage.mean()]
above_maths_avg = names[maths >= maths.mean()]
above_sci_avg = names[science >= science.mean()]
above_eng_avg = names[english >= english.mean()]

In [None]:
from functools import reduce
# product of numbers in range of 1-10
reduce(lambda x, y : x + y, range(1, 11))

In [None]:
results = [above_class_avg, above_eng_avg, above_maths_avg, above_sci_avg]
reduce(np.intersect1d, results)

In [None]:
np.intersect1d(np.intersect1d(np.intersect1d(above_class_avg, above_eng_avg), above_maths_avg), above_sci_avg)

## Working on Dataframes

#### Reading data from various sources (excel/csv, database, json)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

#### Creating a Datafram from lists/arrays

In [None]:
names = np.array(["Olivia", "Liam", "Emma", "Noah", "Ava", "Sophia", "Jackson", "Isabella", "Lucas", "Mia"])
maths = np.array([93, 60, 68, 53, 63, 30, 46, 63, 66, 53])
english = np.array([75, 69, 78, 66, 53, 26, 65, 62, 63, 70])
science = np.array([96, 57, 55, 52, 52, 31, 96, 58, 52, 70])

students = {"Name" : names, "Maths" : maths, "English" : english, "Science": science}
df = pd.DataFrame(students)
df

In [None]:
df.shape

In [None]:
df.dtypes

In [None]:
df.head(2)

In [None]:
df.tail()

In [None]:
df[df.Maths < 35]

In [None]:
np.all(df.Maths >= 35) # Have all the srydents passed in maths?

###### Ex. Create new columns as Total Marks, Percentage, Rank and Grade

In [None]:
df["Total Marks"] = df.Maths + df.English + df.Science
df["Percentage"] = np.round(df["Total Marks"]/3, 2)
conditions = [df.Percentage >= 75, df.Percentage >= 60, df.Percentage >= 40]
results = ["Grade A", "Grade B", "Grade C"]
df["Grade"] = np.select(conditions, results, "Grade D")
df["Rank"] = df.Percentage.rank(ascending=False).astype(int)
df.sort_values("Rank", inplace = True, ignore_index= True)
df

###### Ex. What is AVG marks scored by students in Maths getting Grade B 

In [None]:
int(df[df.Grade == "Grade B"].Maths.mean())

### Conneting to database

In [None]:
!pip install sqlalchemy

In [None]:
from sqlalchemy import create_engine
conn = create_engine("sqlite:///employee.sqlite3")
conn

In [None]:
pd.read_sql("Employee", conn)

In [None]:
pd.read_sql_query("Select * from Employee where Designation = 'Manager'", conn)

In [None]:
mssql://*server_name*/*database_name*?trusted_connection=yes

### Connect to Json Object

In [None]:
import requests

In [None]:
json_obj = requests.get("http://127.0.0.1:5000/tasks").json()

In [None]:
pd.DataFrame(json_obj)

In [None]:
json_obj = requests.get("https://jsonplaceholder.typicode.com/posts").json()

pd.DataFrame(json_obj)

### Read data from CSV

Method 1 - Set the current working directory as the file path

In [None]:
import os
os.chdir(r"./Datasets/")

Method 2 - Upload the files to working environment using jupyter upload button

#### Reading data from csv file

In [None]:
df = pd.read_csv("coffee_sales.csv", header=3)
df.head()

#### Handling null values

In [None]:
df.dropna(axis = 1, how="all", inplace=True)
# df.dropna(how="all")
df.head()

In [None]:
df.isna().any()

In [None]:
df.isna().sum()

#### Remove nulls

In [None]:
df.dropna()  # Removes all the rows where a column value is null

#### Replacing nulls

In [None]:
df["Target Profit"].fillna("0", inplace=True)  # Older technique depricated in pandas 3.0 

In [None]:
df.fillna({"Target Profit" : "0"}, inplace=True)  # New technique (python 12.x onwards)
df.head()

In [None]:
df.isna().any()

In [45]:
df.dtypes

Date             object
Franchise        object
City             object
Product          object
Sales            object
Profit           object
Target Profit    object
Target Sales     object
dtype: object

#### Cleaning and Preparing Data

In [47]:
strg = "($1,200)"  # -1200

In [50]:
trans_obj = str.maketrans("(", "-", "$,)")
int(strg.translate(trans_obj))

-1200

In [59]:
trans_obj = str.maketrans("", "", "$,")  # Note - use translate if more than 1 replace statements are needed
df.Sales = df.Sales.str.translate(trans_obj).astype(float)
df["Target Sales"] = df["Target Sales"].str.translate(trans_obj).astype(float)
df.Profit = df.Profit.str.replace("$", "").astype(float)
df["Target Profit"] = df["Target Profit"].str.replace("$", "").astype(float)
df.head()

Unnamed: 0,Date,Franchise,City,Product,Sales,Profit,Target Profit,Target Sales
0,1-Jan-21,M1,Mumbai,Amaretto,219.0,94.0,100.0,220.0
1,1-Feb-21,M1,Mumbai,Amaretto,140.0,34.0,50.0,140.0
2,1-Mar-21,M1,Mumbai,Amaretto,145.0,-2.0,30.0,180.0
3,1-Apr-21,M1,Mumbai,Amaretto,45.0,11.0,20.0,40.0
4,1-May-21,M1,Mumbai,Amaretto,120.0,13.0,30.0,120.0


In [60]:
df.dtypes

Date              object
Franchise         object
City              object
Product           object
Sales            float64
Profit           float64
Target Profit    float64
Target Sales     float64
dtype: object

In [61]:
# Final Code ------------------------------------------------------------
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv("coffee_sales.csv", header=3)

df.dropna(axis = 1, how="all", inplace=True)
df.fillna({"Target Profit" : "0"}, inplace=True) 

trans_obj = str.maketrans("", "", "$,")  # Note - use translate if more than 1 replace statements are needed
df.Sales = df.Sales.str.translate(trans_obj).astype(float)
df["Target Sales"] = df["Target Sales"].str.translate(trans_obj).astype(float)
df.Profit = df.Profit.str.replace("$", "").astype(float)
df["Target Profit"] = df["Target Profit"].str.replace("$", "").astype(float)

df.head()

Unnamed: 0,Date,Franchise,City,Product,Sales,Profit,Target Profit,Target Sales
0,1-Jan-21,M1,Mumbai,Amaretto,219.0,94.0,100.0,220.0
1,1-Feb-21,M1,Mumbai,Amaretto,140.0,34.0,50.0,140.0
2,1-Mar-21,M1,Mumbai,Amaretto,145.0,-2.0,30.0,180.0
3,1-Apr-21,M1,Mumbai,Amaretto,45.0,11.0,20.0,40.0
4,1-May-21,M1,Mumbai,Amaretto,120.0,13.0,30.0,120.0


###### Ex. Find total avg sales (HINT - use mean() in Sales column)

In [64]:
float(np.round(df.Sales.mean(), 2))

192.99

###### Ex. Find total profits generated by Product "Amaretto"

In [67]:
df[df.Product == "Amaretto"].Profit.sum()

np.float64(5915.0)

###### Ex. Find total profits generated by Product "Amaretto" and 'Caffe Latte'

In [71]:
df[df.Product.isin(("Amaretto", 'Caffe Latte'))].Profit.sum()

np.float64(17290.0)

###### Ex. Find total profits generated by Product "Amaretto" in City "Mumbai"

In [73]:
df[(df.Product == "Amaretto") & (df.City == "Mumbai")].Profit.sum()

np.float64(5915.0)

In [74]:
df[np.logical_and((df.Product == "Amaretto"),(df.City == "Mumbai"))].Profit.sum()

np.float64(5915.0)

<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>

###### Converting date field

###### Ex. Create column for Target Status

###### Ex. Visualise Target status on a bar chart

###### Ex. Visualise product-wise Sales

###### Ex. Display product-wise total sales across state Manipur in DESC Order. Find the product generating maximum sales.