This notebook is a hands-on project, completed by me in ***COURSERA***. This notebook covers the basic operations of the two important python libraries known as NumPy and Pandas.

***NumPy*** and ***Pandas*** are two of the most widely used python libraries in Data Science. They offer high-performance, easy to use structures and data analysis tools.

**GUIDED PROJECT LINK:** https://www.coursera.org/projects/python-for-data-analysis-numpy

**ESTIMATED TIME: 1 Hour**

# TASK 1: DEFINE SINGLE AND MULTI-DIMENSIONAL  NUMPY ARRAYS

In [1]:
# NumPy is a Linear Algebra Library used for multidimensional arrays
# NumPy brings the best of two worlds: (1) C/Fortran computational efficiency, (2) Python language easy syntax 

# Let's define a one-dimensional array 
list = [10,20,30,40,50,60,70]
list

[10, 20, 30, 40, 50, 60, 70]

In [2]:
# Let's create a numpy array from the list "my_list"
import numpy as np
array = np.array([list])
array

array([[10, 20, 30, 40, 50, 60, 70]])

In [3]:
type(array)

numpy.ndarray

In [4]:
# Multi-dimensional (Matrix definition) 
multi_dim = np.array([[1,2,3],[4,5,6]])
multi_dim

array([[1, 2, 3],
       [4, 5, 6]])

MINI CHALLENGE #1: 
- Write a code that creates the following 2x4 numpy array

```
[[3 7 9 3] 
[4 3 2 2]]
```

In [5]:
arr1 = np.array([[3, 7, 9, 3],[4, 3, 2, 2]])
print(arr1)

[[3 7 9 3]
 [4 3 2 2]]


# TASK 2: LEVERAGE NUMPY BUILT-IN METHODS AND FUNCTIONS 

In [6]:
# "rand()" uniform distribution between 0 and 1
random = np.random.rand(20)
random

array([0.13205105, 0.12598439, 0.71931722, 0.89858805, 0.84335662,
       0.40380996, 0.3493284 , 0.96717384, 0.7601122 , 0.84171115,
       0.92644072, 0.68800298, 0.41135866, 0.81063481, 0.56869696,
       0.80268356, 0.43245474, 0.81221282, 0.90459813, 0.40378581])

In [7]:
# you can create a matrix of random number as well
x = np.random.rand(3,3)
x

array([[0.77719628, 0.29332427, 0.96167143],
       [0.97263628, 0.42489768, 0.07501783],
       [0.12434322, 0.1334155 , 0.70558259]])

In [8]:
# "randint" is used to generate random integers between upper and lower bounds
num = np.random.randint(1,50)
num

17

In [9]:
# "randint" can be used to generate a certain number of random itegers as follows
arr = np.random.randint(1,100,17)
arr

array([70, 92, 28, 39, 10, 38, 47, 36, 38, 73, 85, 80, 72, 72, 83, 80, 78])

In [10]:
# np.arange creates an evenly spaced values within a given interval
x = np.arange(1,50)
x

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
       35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])

In [11]:
# create a diagonal of ones and zeros everywhere else
i = np.eye(5)
i

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [12]:
# Matrix of ones
one = np.ones((3,3))
one

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [13]:
# Array of zeros
zero = np.zeros((3,3))
zero

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

MINI CHALLENGE #2:
- Write a code that takes in a positive integer "x" from the user and creates a 1x10 array with random numbers ranging from 0 to "x"

In [14]:
x = int(input())
arr = np.random.randint(1,x,10)
arr

 100


array([ 5, 88, 23, 89, 63, 94, 88, 17, 61, 35])

# TASK 3: PERFORM MATHEMATICAL OPERATIONS IN NUMPY

In [15]:
# np.arange() returns an evenly spaced values within a given interval
x = np.arange(1,10)
x

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [16]:
y = np.arange(1,10)
y

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [17]:
# Add 2 numpy arrays together
sum = x+y
sum

array([ 2,  4,  6,  8, 10, 12, 14, 16, 18])

In [18]:
square = x**2
square

array([ 1,  4,  9, 16, 25, 36, 49, 64, 81])

In [19]:
sqrt = np.sqrt(square)
sqrt

array([1., 2., 3., 4., 5., 6., 7., 8., 9.])

In [20]:
z = np.exp(y)
z

array([2.71828183e+00, 7.38905610e+00, 2.00855369e+01, 5.45981500e+01,
       1.48413159e+02, 4.03428793e+02, 1.09663316e+03, 2.98095799e+03,
       8.10308393e+03])

MINI CHALLENGE #3:
- Given the X and Y values below, obtain the distance between them

```
X = [5, 7, 20]
Y = [9, 15, 4]
```

In [21]:
X = np.array([5, 7, 20])
Y = np.array([9, 15, 4])
a = X**2 + Y**2
dist = np.sqrt(a)
dist

array([10.29563014, 16.55294536, 20.39607805])

# TASK 4: PERFORM ARRAYS SLICING AND INDEXING 

In [22]:
np_array = np.array([3,5,6,2,8,10,20,50])
np_array

array([ 3,  5,  6,  2,  8, 10, 20, 50])

In [23]:
# Access specific index from the numpy array
np_array[0]

3

In [24]:
# Starting from the first index 0 up until and NOT including the last element
np_array[0:8]

array([ 3,  5,  6,  2,  8, 10, 20, 50])

In [25]:
# Broadcasting, altering several values in a numpy array at once
np_array[0:4] = 7 
np_array

array([ 7,  7,  7,  7,  8, 10, 20, 50])

In [26]:
# Let's define a two dimensional numpy array
matrix = np.random.randint(1,10,(3,3))
matrix

array([[3, 9, 4],
       [9, 9, 7],
       [8, 6, 9]])

In [27]:
# Get a row from a matrix
matrix[-1]

array([8, 6, 9])

In [28]:
# Get one element
matrix[1][2]

7

MINI CHALLENGE #4:
- In the following matrix, replace the last row with 0

```
X = [2 30 20 -2 -4]
    [3 4  40 -3 -2]
    [-3 4 -6 90 10]
    [25 45 34 22 12]
    [13 24 22 32 37]
```

In [29]:
X = np.array([[2, 30, 20, -2, -4] , [3, 4, 40, -3, -2] , [-3, 4, -6, 90, 10] , [25, 45, 34, 22, 12] , [13, 24, 22, 32, 37]])
X[4] = 0
X

array([[ 2, 30, 20, -2, -4],
       [ 3,  4, 40, -3, -2],
       [-3,  4, -6, 90, 10],
       [25, 45, 34, 22, 12],
       [ 0,  0,  0,  0,  0]])

# TASK 5: PERFORM ELEMENTS SELECTION (CONDITIONAL)

In [30]:
matrix = np.random.randint(1,10,(5,5))
matrix

array([[7, 2, 3, 9, 3],
       [2, 9, 8, 1, 5],
       [8, 7, 3, 6, 3],
       [3, 6, 3, 8, 5],
       [7, 1, 1, 1, 7]])

In [31]:
new_matrix = matrix[matrix>7]
new_matrix

array([9, 9, 8, 8, 8])

In [32]:
# Obtain odd elements only
odd_matrix = matrix[matrix%2 != 0]
odd_matrix

array([7, 3, 9, 3, 9, 1, 5, 7, 3, 3, 3, 3, 5, 7, 1, 1, 1, 7])

MINI CHALLENGE #5:
- In the following matrix, replace negative elements by 0 and replace odd elements with -2


```
X = [2 30 20 -2 -4]
    [3 4  40 -3 -2]
    [-3 4 -6 90 10]
    [25 45 34 22 12]
    [13 24 22 32 37]
```


In [33]:
X = np.array([[2, 30, 20, -2, -4] , [3, 4, 40, -3, -2] , [-3, 4, -6, 90, 10] , [25, 45, 34, 22, 12] , [13, 24, 22, 32, 37]])
X

array([[ 2, 30, 20, -2, -4],
       [ 3,  4, 40, -3, -2],
       [-3,  4, -6, 90, 10],
       [25, 45, 34, 22, 12],
       [13, 24, 22, 32, 37]])

In [34]:
X[X<0] = 0
X[X%2 == 1] = -2
X

array([[ 2, 30, 20,  0,  0],
       [-2,  4, 40,  0,  0],
       [ 0,  4,  0, 90, 10],
       [-2, -2, 34, 22, 12],
       [-2, 24, 22, 32, -2]])

# TASK 6: UNDERSTAND PANDAS FUNDAMENTALS

In [35]:
# Pandas is a data manipulation and analysis tool that is built on Numpy.
# Pandas uses a data structure known as DataFrame (think of it as Microsoft excel in Python). 
# DataFrames empower programmers to store and manipulate data in a tabular fashion (rows and columns).
# Series Vs. DataFrame? Series is considered a single column of a DataFrame.

In [36]:
import pandas as pd

In [37]:
# Let's define a two-dimensional Pandas DataFrame
# Note that you can create a pandas dataframe from a python dictionary
bank_client_df = pd.DataFrame({'Bank Client ID':[111,222,333,444],
                               'Bank Client Name':['Chanel','Steve','Mitch','Ryan'],
                              'Net Worth':[3500,29000,10000,2000],
                              'Years with Bank':[3,4,9,5]})
bank_client_df

Unnamed: 0,Bank Client ID,Bank Client Name,Net Worth,Years with Bank
0,111,Chanel,3500,3
1,222,Steve,29000,4
2,333,Mitch,10000,9
3,444,Ryan,2000,5


In [38]:
# Let's obtain the data type 
type(bank_client_df)

pandas.core.frame.DataFrame

In [39]:
# you can only view the first couple of rows using .head()
bank_client_df.head(2)

Unnamed: 0,Bank Client ID,Bank Client Name,Net Worth,Years with Bank
0,111,Chanel,3500,3
1,222,Steve,29000,4


In [40]:
# you can only view the last couple of rows using .tail()
bank_client_df.tail(2)

Unnamed: 0,Bank Client ID,Bank Client Name,Net Worth,Years with Bank
2,333,Mitch,10000,9
3,444,Ryan,2000,5


MINI CHALLENGE #6:
- A portfolio contains a collection of securities such as stocks, bonds and ETFs. Define a dataframe named 'portfolio_df' that holds 3 different stock ticker symbols, number of shares, and price per share (feel free to choose any stocks)
- Calculate the total value of the portfolio including all stocks

In [41]:
portfolio_df = pd.DataFrame({'stock ticker symbol':['AAPL','AMSE','RDRT'],
                            'price per share [$]':[3500, 200, 60],
                            'number of stocks':[3,4,7]})
portfolio_df

Unnamed: 0,stock ticker symbol,price per share [$],number of stocks
0,AAPL,3500,3
1,AMSE,200,4
2,RDRT,60,7


In [42]:
stocks_dollar_value = portfolio_df['price per share [$]'] * portfolio_df['number of stocks']
stocks_dollar_value

0    10500
1      800
2      420
dtype: int64

In [43]:
stocks_dollar_value.sum()

11720

# TASK 7: PANDAS WITH CSV/HTML DATA

In [44]:
# Pandas is used to read a  file and store data in a DataFrame

MINI CHALLENGE #7:
- Write a code that uses Pandas to read tabular US retirement data
- You can use data from here: https://www.ssa.gov/oact/progdata/nra.html 

In [45]:
import pandas as pd
us_df = pd.read_html('https://www.ssa.gov/oact/progdata/nra.html')
us_df

[                                        Year of birth  \
 0                                      1937 and prior   
 1                                                1938   
 2                                                1939   
 3                                                1940   
 4                                                1941   
 5                                                1942   
 6                                             1943-54   
 7                                                1955   
 8                                                1956   
 9                                                1957   
 10                                               1958   
 11                                               1959   
 12                                     1960 and later   
 13  Notes: 1. Persons born on January 1 of any yea...   
 
                                                   Age  
 0                                                  65  
 1            

# TASK 8: PANDAS OPERATIONS

In [46]:
# Let's define a dataframe as follows:
bank_client_df = pd.DataFrame({'Bank Client ID':[111,222,333,444],
                               'Bank Client Name':['Chanel','Steve','Mitch','Ryan'],
                              'Net Worth':[3500,29000,10000,2000],
                              'Years with Bank':[3,4,9,5]})
bank_client_df

Unnamed: 0,Bank Client ID,Bank Client Name,Net Worth,Years with Bank
0,111,Chanel,3500,3
1,222,Steve,29000,4
2,333,Mitch,10000,9
3,444,Ryan,2000,5


In [47]:
# Pick certain rows that satisfy a certain criteria 
df_loyal = bank_client_df[bank_client_df['Years with Bank']>=5]
df_loyal

Unnamed: 0,Bank Client ID,Bank Client Name,Net Worth,Years with Bank
2,333,Mitch,10000,9
3,444,Ryan,2000,5


In [48]:
# Delete a column from a DataFrame
del bank_client_df['Bank Client ID']
bank_client_df

Unnamed: 0,Bank Client Name,Net Worth,Years with Bank
0,Chanel,3500,3
1,Steve,29000,4
2,Mitch,10000,9
3,Ryan,2000,5


MINI CHALLENGE #8:
- Using "bank_client_df" DataFrame, leverage pandas operations to only select high networth individuals with minimum $5000 
- What is the combined networth for all customers with 5000+ networth?

In [49]:
high_networth_df = bank_client_df[bank_client_df['Net Worth']>=5000]
high_networth_df

Unnamed: 0,Bank Client Name,Net Worth,Years with Bank
1,Steve,29000,4
2,Mitch,10000,9


In [50]:
bank_client_df['Net Worth'].sum()

44500

# TASK 9: PANDAS WITH FUNCTIONS

In [51]:
# Let's define a dataframe as follows:
bank_client_df = pd.DataFrame({'Bank client ID':[111, 222, 333, 444], 
                               'Bank Client Name':['Chanel', 'Steve', 'Mitch', 'Ryan'], 
                               'Net worth [$]':[3500, 29000, 10000, 2000], 
                               'Years with bank':[3, 4, 9, 5]})
bank_client_df

Unnamed: 0,Bank client ID,Bank Client Name,Net worth [$],Years with bank
0,111,Chanel,3500,3
1,222,Steve,29000,4
2,333,Mitch,10000,9
3,444,Ryan,2000,5


In [52]:
# Define a function that increases all clients networth (stocks) by a fixed value of 20% (for simplicity) 
def networth_update(balance):
    return balance * 1.2

In [53]:
# You can apply a function to the DataFrame 
bank_client_df['Net worth [$]'].apply(networth_update)

0     4200.0
1    34800.0
2    12000.0
3     2400.0
Name: Net worth [$], dtype: float64

In [54]:
bank_client_df['Bank Client Name'].apply(len)

0    6
1    5
2    5
3    4
Name: Bank Client Name, dtype: int64

MINI CHALLENGE #9:
- Define a function that triples the stock prices and adds $200
- Apply the function to the DataFrame
- Calculate the updated total networth of all clients combined

In [55]:
def networth_update(balance):
    return balance * 3 + 200

In [56]:
result = bank_client_df['Net worth [$]'].apply(networth_update)
result

0    10700
1    87200
2    30200
3     6200
Name: Net worth [$], dtype: int64

In [57]:
bank_client_df['Net worth [$]'].sum()

44500

# TASK 10: PERFORM SORTING AND ORDERING IN PANDAS

In [58]:
# Let's define a dataframe as follows:
bank_client_df = pd.DataFrame({'Bank client ID':[111, 222, 333, 444], 
                               'Bank Client Name':['Chanel', 'Steve', 'Mitch', 'Ryan'], 
                               'Net worth [$]':[3500, 29000, 10000, 2000], 
                               'Years with bank':[3, 4, 9, 5]})
bank_client_df

Unnamed: 0,Bank client ID,Bank Client Name,Net worth [$],Years with bank
0,111,Chanel,3500,3
1,222,Steve,29000,4
2,333,Mitch,10000,9
3,444,Ryan,2000,5


In [59]:
# You can sort the values in the dataframe according to number of years with bank
bank_client_df.sort_values(by = 'Years with bank')

Unnamed: 0,Bank client ID,Bank Client Name,Net worth [$],Years with bank
0,111,Chanel,3500,3
1,222,Steve,29000,4
3,444,Ryan,2000,5
2,333,Mitch,10000,9


In [60]:
# Note that nothing changed in memory! you have to make sure that inplace is set to True
# Set inplace = True to ensure that change has taken place in memory
# Note that now the change (ordering) took place
bank_client_df.sort_values(by = 'Years with bank',inplace=True)
bank_client_df

Unnamed: 0,Bank client ID,Bank Client Name,Net worth [$],Years with bank
0,111,Chanel,3500,3
1,222,Steve,29000,4
3,444,Ryan,2000,5
2,333,Mitch,10000,9


# TASK 11: PERFORM CONCATENATING AND MERGING WITH PANDAS

In [61]:
df1 = pd.DataFrame({'A':['A0','A1','A2','A3'],'B':['B0','B1','B2','B3'],
                   'C':['C0','C1','C2','C3'],'D':['D0','D1','D2','D3']}, index = [0,1,2,3])
df1

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3


In [62]:
df2 = pd.DataFrame({'A':['A4','A5','A6','A7'],'B':['B4','B5','B6','B7'],
                   'C':['C4','C5','C6','C7'],'D':['D4','D5','D6','D7']}, index = [4,5,6,7])
df2

Unnamed: 0,A,B,C,D
4,A4,B4,C4,D4
5,A5,B5,C5,D5
6,A6,B6,C6,D6
7,A7,B7,C7,D7


In [63]:
df3 = pd.DataFrame({'A':['A8','A9','A10','A11'],'B':['B8','B9','B10','B11'],
                   'C':['C8','C9','C10','C11'],'D':['D8','D9','D10','D11']}, index = [8,9,10,11])
df3

Unnamed: 0,A,B,C,D
8,A8,B8,C8,D8
9,A9,B9,C9,D9
10,A10,B10,C10,D10
11,A11,B11,C11,D11


In [64]:
pd.concat([df1,df2,df3])

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3
4,A4,B4,C4,D4
5,A5,B5,C5,D5
6,A6,B6,C6,D6
7,A7,B7,C7,D7
8,A8,B8,C8,D8
9,A9,B9,C9,D9


# TASK 12: PROJECT AND CONCLUDING REMARKS

- Define a dataframe named 'Bank_df_1' that contains the first and last names for 5 bank clients with IDs = 1, 2, 3, 4, 5 
- Assume that the bank got 5 new clients, define another dataframe named 'Bank_df_2' that contains a new clients with IDs = 6, 7, 8, 9, 10
- Let's assume we obtained additional information (Annual Salary) about all our bank customers (10 customers) 
- Concatenate both 'bank_df_1' and 'bank_df_2' dataframes
- Merge client names and their newly added salary information using the 'Bank Client ID'
- Let's assume that you became a new client to the bank
- Define a new DataFrame that contains your information such as client ID (choose 11), first name, last name, and annual salary.
- Add this new dataframe to the original dataframe 'bank_df_all'.

In [65]:
raw_data = {'Bank Client ID': ['1','2','3','4','5'],
            'First Name':['Sheema','Ravi','Ram','Tessa','Guru'],
            'Last Name':['Rao','Takur','Kumar','Andrew','Nath']}
Bank_df_1 = pd.DataFrame(raw_data, columns = ['Bank Client ID','First Name','Last Name'])
Bank_df_1

Unnamed: 0,Bank Client ID,First Name,Last Name
0,1,Sheema,Rao
1,2,Ravi,Takur
2,3,Ram,Kumar
3,4,Tessa,Andrew
4,5,Guru,Nath


In [66]:
raw_data = {'Bank Client ID': ['6','7','8','9','10'],
            'First Name':['Sheethal','Rajesh','Raghu','Arun','Ravi'],
            'Last Name':['Sharma','Thamran','Ram','Vijay','Shastri']}
Bank_df_2 = pd.DataFrame(raw_data, columns = ['Bank Client ID','First Name','Last Name'])
Bank_df_2

Unnamed: 0,Bank Client ID,First Name,Last Name
0,6,Sheethal,Sharma
1,7,Rajesh,Thamran
2,8,Raghu,Ram
3,9,Arun,Vijay
4,10,Ravi,Shastri


In [67]:
raw_data = {'Bank Client ID': ['1','2','3','4','5','6','7','8','9','10'],
            'Annual Salary ($/Year)': [25000, 30000, 54000, 87000, 32000, 76000, 56000, 43000, 27000, 37000]}
bank_df_salary = pd.DataFrame(raw_data, columns = ['Bank Client ID','Annual Salary ($/Year)'])
bank_df_salary

Unnamed: 0,Bank Client ID,Annual Salary ($/Year)
0,1,25000
1,2,30000
2,3,54000
3,4,87000
4,5,32000
5,6,76000
6,7,56000
7,8,43000
8,9,27000
9,10,37000


In [68]:
Bank_df_all = pd.concat([Bank_df_1, Bank_df_2])
Bank_df_all

Unnamed: 0,Bank Client ID,First Name,Last Name
0,1,Sheema,Rao
1,2,Ravi,Takur
2,3,Ram,Kumar
3,4,Tessa,Andrew
4,5,Guru,Nath
0,6,Sheethal,Sharma
1,7,Rajesh,Thamran
2,8,Raghu,Ram
3,9,Arun,Vijay
4,10,Ravi,Shastri


In [69]:
bank_df_all = pd.merge(Bank_df_all, bank_df_salary, on = 'Bank Client ID')
bank_df_all

Unnamed: 0,Bank Client ID,First Name,Last Name,Annual Salary ($/Year)
0,1,Sheema,Rao,25000
1,2,Ravi,Takur,30000
2,3,Ram,Kumar,54000
3,4,Tessa,Andrew,87000
4,5,Guru,Nath,32000
5,6,Sheethal,Sharma,76000
6,7,Rajesh,Thamran,56000
7,8,Raghu,Ram,43000
8,9,Arun,Vijay,27000
9,10,Ravi,Shastri,37000


In [70]:
new_client = {'Bank Client ID':['11'],
             'First Name':'Jeyasri',
             'Last Name':'Senthil',
             'Annual Salary ($/Year)':[100000]}
new_client = pd.DataFrame(new_client, index = [10])
new_client

Unnamed: 0,Bank Client ID,First Name,Last Name,Annual Salary ($/Year)
10,11,Jeyasri,Senthil,100000


In [71]:
pd.concat([bank_df_all, new_client])

Unnamed: 0,Bank Client ID,First Name,Last Name,Annual Salary ($/Year)
0,1,Sheema,Rao,25000
1,2,Ravi,Takur,30000
2,3,Ram,Kumar,54000
3,4,Tessa,Andrew,87000
4,5,Guru,Nath,32000
5,6,Sheethal,Sharma,76000
6,7,Rajesh,Thamran,56000
7,8,Raghu,Ram,43000
8,9,Arun,Vijay,27000
9,10,Ravi,Shastri,37000
