# Bridging - Coding with Jupyter Notebook


0. Markdowns, Headings, Comments
1. Built-in modules, third party packages 
2. Arrays, Dictionary and Data frames 

You may go to `View` -> `Table of Contents` to see the flow of content clearly. 

## 1. Built-in Python Modules and Third-party Packages

### 1.1 Built-in modules 

You may import built-in python modules without installing them. For example, here we are going to use `math` module. 

- Check all built-in python libraries distributed with Python 3.12: https://docs.python.org/3.12/library/index.html

In [None]:
import math 
# Documentation https://docs.python.org/3.10/library/math.html
# various functions/variables in math module: ceil, floor, sqrt, pi, log10.

print(math.ceil(1.44))
print(math.floor(10.44))
print(math.sqrt(16))
print(math.log10(100))

### 1.2  Third-party packages
Unlike built-in moduels, there are lots of third-party python packages **which need to be installed first before imported and used**.
- In Jupyte Notebook, we can use ``pip`` to install packages in current environment.
- Here let's install three third-party packages (e.g., ``pandas``, ``numpy``, ``matplolib``) which we will use today.  

In [None]:
#pip install pandas numpy matplotlib

Import ``pandas`` and ``numpy`` first.

In [None]:
import pandas as pd  
# Documentation: https://pandas.pydata.org/pandas-docs/stable/reference/

import numpy as np
# Documentation: https://numpy.org/doc/stable/reference/

## 2. Arrays, Dictionary and Data frames 

### 2.1 Handle arrays with `numpy`

In [None]:
arr = np.array([[1,2,6],[4,5,1]])
arr

In [None]:
arr.shape   

In [None]:
print(arr[0])    # print values on the first row
print(arr[0,0])  # print value in first row, first column

In [None]:
print(np.max(arr))         # return max value in the flattened array
print(np.max(arr,axis=0))  # compare across rows and return max values  
print(np.max(arr,axis=1))  # compare across columns and return max values

In [None]:
arr + 10   # element-by-element computation

In [None]:
arr**2    # raise each element to its 2nd power

### 2.2 Handle data frames with `pandas`

We can create a data frame from an array.

In [None]:
df = pd.DataFrame(data = arr, 
                  columns=['a', 'b', 'c'], 
                  index = ['r1','r2']) 
df    # display the data frame

In [None]:
df.shape      # check data shape

In [None]:
df.columns     # check column names

In [None]:
df.index       # check row names

In [None]:
df['a']        # select a column by its name

We can also create a data frame from a dictionary with `key: value` pairs.  Check [Dictionary Tutorial](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) for details about dictionary. 


In [None]:
car_dict = {
  "brand": ["Ford",'BMW','Volkswagen','Benz','Benz','Benz','Volkswagen','Volkswagen'],
  "electric": [False,True,False,True,True,False,True,False],
  "year": [1964,1980,2000,1990,2011,2000,2000,2022],
  "colors": ["red", "white", "blue",'white','red','blue','white','white'],
  "price":[500,1000,700,1200,100,100,80,130]
}

car_dict

In [None]:
car_df = pd.DataFrame(car_dict)

car_df

In [None]:
car_df.set_index('brand', inplace=True)   # set "brand" column as index, replace existing data frame 

car_df

In [None]:
car_df.reset_index(inplace = True)  # reset the index as default numbers 

car_df

Save the above dataframe ``car_df`` as a csv file named ``car_df.csv`` in my work folder where this notebook is located (i.e., ``/Users/jingliu/OneDrive - Hong Kong Baptist University/Bridging_python``), just indicate the file name would be enough. 

In [None]:
car_df.to_csv('car_df.csv',index = False)    # save the data frame as a csv file in CWD, ignore index

### 2.3 Simple Data Visualization with `matplotlib`

Here we have saved a csv file named ``diabetes.csv`` in the ``Data`` folder in CWD. 

- The absolute path to the folder is is `/Users/jingliu/OneDrive - Hong Kong Baptist University/Bridging_python/Data`
- Note we aleady imported pandas earlier. 

In [None]:
df = pd.read_csv('Data/diabetes.csv')  ## Read in a csv file - relative path

df.head()

Visualize the relationship between ``Age`` and ``BMI`` with a simple scatter plot. 

-- Need to import ``matplotlib``first.

In [None]:
import matplotlib.pyplot as plt     # Documentation: https://matplotlib.org/stable/

fig = plt.figure(figsize=(10,6))   # create a new figure with specific figsize(width, height in inches)
plt.scatter(df['Age'], df['BMI'], color='lightblue')
plt.xlabel("Age")
plt.ylabel("BMI")
plt.title('BMI and Age');

In [None]:
fig.savefig('a_simple_plot.jpeg')   # save the figure to current local directory  

Last, let's save our work in a readable format.

- First, go to `Kernel` -> `Restart Kernel and Run all Cells`
- Go to `File` -> `Save and Export Notebook As` -> `HTML` or `PDF`. 
