<a href="https://colab.research.google.com/github/harisgulzar1/pythonbasics/blob/main/PythonCourseLecture2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Let's move towards real data analytics! 💪

In this lecture we will study:

*   Numerical Operations in Python (**NumPy**) 
*   Real data values (**Pandas**)
*   Reading data from files and doing analysis.



**NumPy**

Advance functionalities to deal with Python arrays.

In [None]:
import numpy as np

a = np.array([1, 2, 3])

print(a)

print(a[0])

print(a[1:])

print("Shape: ", a.shape) # built in function of Numpy array

print("Dimension: ", a.ndim) # built in function of Numpy array

In [None]:
# 2D arrays in Numpy same as Python

b = np.array([[1,2,3],
              [4,5,6]])

print(b)

print("Shape:", b.shape)
print("Dimension:", b.ndim)

Initializing arrays with some default values.

In [None]:
# initialize with zeros


a = np.zeros((2,3))

a

In [None]:
b = np.ones((2,3))

b

In [None]:
# inialize with random values
# between 0-1 by default

c = np.random.random((3,3))
c

In [None]:
c = np.random.randint(0,5, (3,3))
c

**Selecting dimesions of np arrays (indexing)**

More flexible than simple python arrays.

In [None]:
a_new = np.array([[1,2,3],
                 [4,5,6],
                 [7,8,9]])

# selecting rows of 2D array
a_rows = a_new[:,0]
a_rows

In [None]:
# selecting columns of 2D array

a_columns = a_new[0,:]
a_columns

In [None]:
# Make a sequence

d = np.arange(3, 10, 1)
d

**Numerical Operations in Python (Numpy)**

In [None]:
a = np.array([
    [0, 1, 2],
    [3, 4, 5],
    [6, 7, 8]
])

b = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

# Addition

c = a + b
c

In [None]:
# element-wise multiplication

d = a*b
d

In [None]:
# element-wise division

e = a/b
e

In [None]:
# Matrix Multiplications

f = np.dot(a,b)
f

In [None]:
# square-root

g = np.sqrt(a)
g

In [None]:
# power
n = 2
h = np.power(a,n)
h

In [None]:
# performing operations along specific dimension

i = np.power(a[:,0],n)
i

In [None]:
# multiplication with constant value

a = np.arange(1,5,1)
a

In [None]:
b = a*2
b

In [None]:
# broadcasting (When dimensions are not same)

a1 = np.random.randint(0,5, (1,3)) 
print(a1)
a2 = np.random.randint(0,5,(3,1))
print(a2) 

In [None]:
a = a1 + a2
a

**Data Operations**

In [None]:
x = np.random.randint(0,10,(3,3))
x

In [None]:
# average of all elements

x_avg = np.average(x)
x_avg

In [None]:
# average along specific dimension

x_avg = np.average(x, 0)
x_avg

In [None]:
# maximum value

x_max = np.max(x)
x_max

In [None]:
# minimum value

x_min = np.min(x)
x_min

**Exercise** ❓

Define two 2x2 randomized Numpy arrays as a1 and a2.

Add first row of a1 to the second row of a2.

## **2.2 Introduction to Pandas**

Pandas is enhanced versions of NumPy arrays in which the rows and columns are identified with labels rather than simple integer indices.

In [None]:
# in numpy arrays we simply define data, cannot define labels

import numpy as np

data = np.array([0.25, 0.5, 0.75, 1.0])

data

### ***Series*** 

Data Type (Object) of Pandas

In [None]:
# Series data type

import pandas as pd

data = pd.Series([0.25, 0.5, 0.75, 1.0],
                 index=['a', 'b', 'c', 'd'])
data


In [None]:
data['a']

Almost same as Python dictionary, with much more enhanced functionalities.

💡 Dictionary to Pandas Data Type

In [None]:
# Example of population

population_dict = {'California': 38332521,
                   'Texas': 26448193,
                   'New York': 19651127,
                   'Florida': 19552860,
                   'Illinois': 12882135}

population = pd.Series(population_dict)
population

### ***DataFrame***

One or multiple Series with labels.

More pratical representation of dataset!

In [None]:
states = pd.DataFrame({'population': population})

states

In [None]:
# multiple values in DataFrame

area_dict = {'California': 423967,
             'Texas': 695662,
             'New York': 141297,
             'Florida': 170312,
             'Illinois': 149995}
area = pd.Series(area_dict)
area

In [None]:
states = pd.DataFrame({'population': population,
                       'area': area})
states

# can fill missing datas without error
# Python and Numpy data types are very strict about dimensions

Different functions for DataFrame object.

In [None]:
states.index

In [None]:
states.columns

In [None]:
# can get individual Series from DataFrame

states['area']

**Exercise** ❓

Add another Series named as ***houses*** in existing DataFrame (can use any supposed values)

### **2.3 Read data from the files.**

We will use example of ***California housing dataset.***

In [None]:
# read csv file  --> pd.read_csv
# read excel file --> pd.read_excel


df = pd.read_csv('sample_data/california_housing_train.csv')
df

# read from text file --> pd.read_csv
# df = pd.read_csv('sample_data/california_housing_test.txt', sep="\t",header=None)
df

Let's play around with our data in Python

In [None]:
# selecting series from DataFrame
# OR, slecting columns of file
df['population']

Performing Numerical Operations on the Data.

Pandas ➕ Numpy = ✨

In [None]:
# Convert Pandas Series into Numpy Array

population = np.array(df['population'])

population_average = np.average(population)

population_average

In [None]:
print("The maximum per block poulation in Calforina is", np.max(population))

***Recalling Function Defination in Python.***

In [173]:
def function(argument):
  argument = argument + 1
  return argument

In [None]:
a = function(5)
print("The value from the function is", a)

**Exercise** ❓

Write a function which returs **avergae value of number of people per house** from the csv file.



In [176]:
def analytics(file):
  # write code under this line
  
  return 

In [None]:
file = 'sample_data/california_housing_train.csv'

print('The average number of people per house in California are ', analytics(file))