# Introduction to Python Libraries by ORSC
In this notebook, we'll discover some of the most popular Python libraries, they are the following:
* Numpy
* Pandas
* Matplotlib

This notebook is created by:
* Essaid Zerbout
* Abdelmalik Benfadhil

Link to the slides presentation: https://docs.google.com/presentation/d/1O1WigBg7qUBD2QSxl4XUj4gauytgAdzW7VY5RBaCUmE/edit?usp=sharing

# 1. Numpy
NumPy (Numerical Python) is the fundamental package for scientific computing in Python. It provides a multidimensional array object, it allows for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

In [None]:
# importing numpy
import numpy as np

# range
firstnp = np.arange(7)
print(firstnp)


### 1.1 Numpy arrays
In Python we have lists that serve the purpose of arrays, but they are slow to process.

NumPy provides an array object that is up to 50x faster than traditional Python lists.

The array object in NumPy is called ndarray, it provides a lot of supporting functions that make working with ndarray very easy.

Arrays are very frequently used in data science, where speed and resources are very important.

In [None]:
# ndarray type
print(type(firstnp))

In [None]:
# create ndarray from tuple or list
arr = np.array((1,2))
arr2 = np.array([1,2])
print(arr == arr2)

In [None]:
# 0 dimension array
a = np.array(42)
print(a, "\narray of dimension",a.ndim)
print("and of shape: ",a.shape)

print("\n")
# 1 dimension array
b = np.array([16,23])
print(b, "\narray of dimension",b.ndim)
print("and of shape: ",b.shape)

print("\n")
# 2 dimension array
c = np.array([[1,2,3],[4,5,6]])
print(c, "\narray of dimension",c.ndim)
print("and of shape: ",c.shape)

In [None]:
# indexing
print(b[0]) # 0 for first element, -1 for the last
print(c[0,2]) # element of row i and column j is accessed with [i,j]

print("\n")
#slicing
print(b[:]) # [begining:end]
print(c[:,1:5]) # [beginning:end,beginning:end]

print("\n")
# with step
d = np.array([1, 2, 3, 4, 5, 6, 7])
print(d[1::2]) # [beginning:end:step]

In [None]:
# matrix operations

# transpose of matrix
m = np.array([[1,2],[3,4]])
mx = m.T
print(m)
print("\n")
print(" transpose of the matrix: ")
print(mx)

# add, subtract multiply, divide
print("\n matrix summation : \n",np.add(m,mx))
print("\n matrix subtraction: \n",np.subtract(m,mx))
print("\n matrix multiplication: \n",np.multiply(m,mx))
print("\n matrix division: \n",np.divide(m,mx))

# dot product
print("\n matrix dot product: \n",m@mx) # or np.dot()

# cross product
print("\n matrix cross product: \n",np.cross(m,mx))



### 1.2 Random simulation
using numpy you can generate "random" values and distributions
It is important that the values generated aren't truly random, because they're a result of some algorithm, we call these values "pseudo-random"

In [None]:
# import random module
from numpy import random

# random integer
x = random.randint(10)
print(x)

print("\n")

# random float 0 - 1
y = random.rand()
print(y)

print("\n")
# random choice
z = random.choice([3, 4,5, 7, 9], size=(3, 5))
print(z)



In [None]:
# generate normal distribution
print("normal distribution")
n = random.normal(size=20)
print(n)

print("\n")
# set mean and std 
print("new normal distribution")
x = random.normal(loc = 4, scale = 2,size = 20)
print(x)

print("\n")
# generate uniform distribution
print("uniform distribution")
x = random.uniform(size=(2, 3)) # low =, high = 
print(x)

print("\n")
# generate poisson distribution
print("poisson distribution")
x = random.poisson(lam=2, size=10)
print(x)

print("\n")
# generate binomial distribution
print("binomial distribution")
x = random.binomial(n=10, p=0.5, size=10)
print(x)


In [None]:
# visualize distribution
import matplotlib.pyplot as plt
import seaborn as sns

# we're using seaborn which is built on matplotlib but we'll get back to visualizing distributions later
sns.displot(n,  kind="kde").set(title = "normal distribution")
plt.show()

# 2. Pandas
Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.



In [None]:
# importing pandas library
import pandas as pd

In [None]:
# creating a data frame using a dictionnary
mydataset = {
  'Name': ["Abdellah", "Malik", "Wiam", "Fazila"],
  'Role': ["Président", "VP", "SG", "SGA"],
  'Height': [174 , 190, 165, 170]
}
df = pd.DataFrame(mydataset)
df
# df = pd.DataFrame(mydataset, index = ["person1", "person2", "person3", "person 4"])
# df

In [None]:
#refer to the row index:
print(df.loc[0])
#use a list of indexes:
print(df.loc[[0, 1]])

A Pandas Series is like a column in a table.

In [None]:
a = [1, 7, 2]
myvar = pd.Series(a)
myvar
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar)

In [None]:
#to show first 5 rows
df.head()
#to show the last row of data
df.tail()

In [None]:
#add a row
df.loc[len(df)]=["Essaid"," Head of IT","178"]
df

In [None]:
# to drop a column
df1 = df.drop("Height",axis=1)
df1

In [None]:
#data info
df.info()

In [None]:
# some useful info
df.Height.describe()

In [None]:
# creating a new column
# with list
structure = []
for i in range(len(df1)):
    structure.append("bureau")
df1["Structure"] = structure
df1

print("\n")
# with np.array
b = np.array(["bureau","IT","formation","comm","IT"])
df1["Structure"] = b
df1

In [None]:
# # to read df from a file
# df = pd.read_csv("file path")
# # to save df as csv (excel,...)
# df.to_csv("nameyourfile.csv")

# 3. Matplotlib
Matplotlib is a low level graph plotting library in python that serves as a visualization utility, it enables creating static, animated, and interactive visualization.


### 3.1 Pyplot
Most of the Matplotlib utilities lies under the pyplot submodule, it is a collection of functions that make matplotlib work like MATLAB. Each pyplot function makes some change to a figure: e.g., creates a figure, creates a plotting area in a figure, plots some lines in a plotting area, decorates the plot with labels, etc.

In [None]:
# importing the submodule pyplot
import matplotlib.pyplot as plt

In [None]:
# single plot
xpoints = np.array([1, 2, 6, 8]) # if not specified it will retrun 1, 2, 3,...
ypoints = np.array([3, 8, 1, 10])

plt.plot(xpoints, ypoints, marker = "*", c="b") #, marker = 'o' , color = 'r' , 'o--g'
### Labeling
plt.title("first plot")
plt.xlabel("Xpoints")
plt.ylabel("Ypoints")
### Adding a grid
plt.grid(axis = 'y') # axis = 'x','y'
plt.show()

In [None]:
# Multiple plot
x1 = np.array([0, 1, 2, 3])
y1 = np.array([3, 8, 1, 10])
x2 = np.array([0, 1, 2, 3])
y2 = np.array([6, 2, 7, 11])

plt.plot(x1, y1, x2, y2)
plt.show()

# or we can use 
plt.plot(x1, y1, 'b')
plt.plot(x2, y2, 'k')
plt.show()

### 3.2 Scatter plot

In [None]:
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])

plt.scatter(x, y)
plt.show()


In [None]:
x1 = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y1 = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x1, y1)
plt.show()

sizes = np.array([20,50,100,200,500,1000,60,90,10,300,600,800,75,100,15])

x2 = np.array([2,2,8,1,15,8,12,9,7,3,11,4,7,14,12])
y2= np.array([100,105,84,105,90,99,90,95,94,100,79,112,91,80,85])
plt.scatter(x2, y2, color = 'r', s=sizes) #, s = sizes
plt.show()

In [None]:
# combine random distribution with plotting with matplotlib
print("The Moivre Laplace theorem (a special case for the Central Limit Theorem) says that as n increases,\nthe binomial distribution with n trials and probability p of success gets closer and closer to a normal distribution.\nThat is, the binomial probability of any event gets closer and closer to the normal probability of the same event.")
print("We can visualize this result by looking at the binomial distribution and increasing the size, as it increases we get the bell curve shape of the normal distrbution")
x = random.binomial(n=10, p=0.5, size=1000000)
plt.hist(x)
plt.show() 