## Advanced Python Data Type
- Built-in python collection type: ```List```, ```Tuple```, ```Set```, and ```Dictionary```
- Besides python also has rich 3rd-party library/adddon: ```numpy```, ```panda```

# Tuples
- Tuples are used to store multiple items in a singel variable.
- A tuple is a collection which is ordered and unchangeable.
- Tuples are written with round brackets.

**Example**:
```
student_info = ("Johnny", 23, "johnny@ggmail.com")
print(student_info)
```

# Tuple Items
- Tuple items are ordered, unchangeable, and allow duplicate values.
- Tuple items are indexed, the first item has index [0], the second item has index [1] etc.

```
print(student_info[1])
```

# Unpack a tuple
```
student_info = ("Johnny", 23, "johnny@ggmail.com")
(name, age, email) = student_info
print(email)
```

# Set
- A set is a collection which is unordered and unindexed. 
- In Python, sets are written with curly brackets: ```{}```.

**Example**:
```
names = {"April", "Winnie", "Cherry"}
print(names)
names.add("Johnny")
print(names)
names.add("April")
print(names)
names.remove("Winnie")
print(names)
for name in names:
  print(name)
```


# Set Methods
Set has many built-in methods
- ```add()```	Adds an element to the set
- ```clear()```	Removes all the elements from the set
- ```copy()```	Returns a copy of the set
- ```difference()```	Returns a set containing the difference between two or more sets
- ```difference_update()```	Removes the items in this set that are also included in another, specified set
- ```discard()```	Remove the specified item
- ```intersection()```	Returns a set, that is the intersection of two other sets
- ```intersection_update()```	Removes the items in this set that are not present in other, specified set(s)
- ```isdisjoint()```	Returns whether two sets have a intersection or not
- ```issubset()```	Returns whether another set contains this set or not
- ```issuperset()```	Returns whether this set contains another set or not
- ```pop()```	Removes an element from the set
- ```remove()```	Removes the specified element
- ```symmetric_difference()```	Returns a set with the symmetric differences of two sets
- ```symmetric_difference_update()```	inserts the symmetric differences from this set and another
- ```union()```	Return a set containing the union of sets
- ```update()```	Update the set with the union of this set and others

# Dictionary
- A dictionary is a collection which is unordered, changeable and indexed. 
- In Python dictionaries are written with curly brackets, and they have ***keys*** and ***values***.

**Create and print a dictionary**:
```
client = {
    "id": 888,
    "company": "ABC Co.",
    "email": "hello@abc.com"
}
print(client)
print(client["email"])
print(client.get("company"))
client["company"] = "XYZ Co."
print(client)
```

# Loop Through a Dictionary
Loop through a dictionary by using a ```for``` loop
```
client = {
    "id": 888,
    "company": "ABC Co.",
    "email": "hello@abc.com"
}
```

**Getting the keys**:
```
for x in client:
    print(x)
```

**Getting the values**:
```
for x in client.values():
    print(x)
```

**Getting both the keys and the values**:
```
for key, value in client.items():
    print(key, value)
```

# Check if Key Exists
To determine if a specified key is present in a dictionary use the in keyword:
```
client = {
    "id": 888,
    "company": "ABC Co.",
    "email": "hello@abc.com"
}
if "email" in client:
    print("email is in the Dictionaries")
```

# Adding Items
Adding an item to the dictionary is done by using a new index key and assigning a value to it
```
client = {
    "id": 888,
    "company": "ABC Co.",
    "email": "hello@abc.com"
}
client["address"] = "Hong Kong"
print(client)
```

# Reading JSON Data From the Web
- You need to ues `urllib` and `json`
- Let's try to read weather forecast from: https://data.weather.gov.hk/weatherAPI/opendata/weather.php?dataType=fnd&lang=tc
- More information about weather data: https://www.hko.gov.hk/tc/abouthko/opendata_intro.htm
- More open data at https://data.gov.hk/en/
- Install JSON viewer Google Chrome Add to make JSON data display more readable
- Firefox has built-in JSON beautified display

**Example**:
```
import urllib.request, json 
url = "https://data.weather.gov.hk/weatherAPI/opendata/weather.php?dataType=fnd&lang=tc"
with urllib.request.urlopen(url) as req:
    data = json.loads(req.read().decode())
    print(data)
    print(data["generalSituation"])
    print(data["weatherForecast"])
    print(data["weatherForecast"][0])
```

---
# numpy
- NumPy is a python library used for working with arrays.
- It also has functions for working in domain of linear algebra, fourier transform, and matrices.
- NumPy stands for Numerical Python.
___

# Why `numpy`
- In Python we have lists that serve the purpose of arrays, but they are slow to process.
- NumPy aims to provide an array object that is up a lot faster.
- The array object in NumPy is called `ndarray`, it provides a lot of supporting functions that make working with `ndarray` very easy.
- Arrays are very frequently used in data science, where speed and resources are very important.



# Installation of numpy
numpy is already included in anaconda, so no need to install

if you use regular python, you can install with pip
```
pip install numpy
```

# import it before you use it
- Once NumPy is installed, import it in your applications by adding the `import` keyword:
- You will see an error message if didn't import it first

```
import numpy
```

# numpy as np
NumPy is usually imported under the np alias.
```
import numpy as np
print(np.__version__)

```

# Create a numpy ```ndarray``` object
- NumPy is used to work with arrays. The array object in NumPy is called ```ndarray```.
- We can create a NumPy ```ndarray``` object by using the ```array()``` function.
```
import numpy as np
regular_array = [1, 2, 3, 4, 5]
arr = np.array(regular_array)
print(arr)
print(type(arr))
```


# Dimensions in Arrays
- A dimension in arrays is the level of array depth (nested arrays).

**0-D Arrays**:
```
import numpy as np
arr = np.array(42)
print(arr)
```
**1-D Arrays**:
```
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)

```
**2-D Arrays**:
```
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)

```
**3-D Arrays**:
```
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)

```
**Check number of dimensions**:
```
print(arr.ndim)
```

# Array Indexing
```
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr[0])
arr2 = np.array([[1,2,3,4,5], [6,7,8,9,10]])
print('2nd element on 1st dim: ', arr2[0, 1])
```

# Slicing Arrays
- Slicing in python means taking elements from one given index to another given index.
- We pass slice instead of index like this: ```[start:end]```.
- We can also define the step, like this: ```[start:end:step]```.
- If we don't pass start its considered 0
- If we don't pass end its considered length of array in that dimension
- If we don't pass step its considered 1
```
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5])
print(arr[1:])
print(arr[0:-3])
print(arr[::2])
```

# Slicing 2-D Arrays
```
import numpy as np
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(arr[1, 1:4])
print(arr[:, 1:4])
print(arr[0:2, 2])
print(arr[0:2, 1:4])
```

# Data Type in numpy
- ```i``` - integer
- ```b``` - boolean
- ```u``` - unsigned integer
- ```f``` - float
- ```c``` - complex float
- ```m``` - timedelta
- ```M``` - datetime
- ```O``` - object
- ```S``` - string
- ```U``` - unicode string
- ```V``` - fixed chunk of memory for other type ( void )

# Checking the Data Type of an Array
- The NumPy array object has a property called `dtype` that returns the data type of the array
```
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr.dtype)
arr2 = np.array(['apple', 'banana', 'cherry'])
print(arr2.dtype)
```

# Shape of an Array
- The shape of an array is the number of elements in each dimension.
- NumPy arrays have an attribute called shape that returns a tuple with each index having the number of corresponding elements.
```
import numpy as np
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arr.shape)
```


# Reshapeing Arrays
- Reshaping means changing the shape of an array.
- The shape of an array is the number of elements in each dimension.
- By reshaping we can add or remove dimensions or change number of elements in each dimension.
```
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
arr2d = arr.reshape(4, 3)
arr3d = arr.reshape(2, 3, 2)
print(arr2d)
print(arr2d.shape)
print(arr3d)
print(arr3d.shape)
print(type(arr3d.shape))
```

# `arange()` method
- `arange()` is one of the array creation routines based on numerical ranges. 
- It creates an instance of ndarray with evenly spaced values and returns the reference to it.

**Syntax**:
```
numpy.arange([start, ]stop, [step, ], dtype=None) -> numpy.ndarray
```

**Example**:
```
np.arange(0, 20, 2)
```


# `linspace()` method
create a range of evenly spaced numbers in Python

**Syntax**:
```
numpy.linspace([start, ]stop, [count, ]) -> numpy.ndarray
```

**Example**:
```
np.linspace(0, 100, 10)
```

# Flattening the arrays
- Flattening array means converting a multidimensional array into a 1D array.
- We can use ```reshape(-1)``` to do this.
**Example**:
```
flattened = arr3d.reshape(-1)
print(flattened)
```

# Re-arranging the elements of array
- There are a lot of functions for changing the shapes of arrays in numpy `flatten`, `ravel` and also for rearranging the elements `rot90`, `flip`, `fliplr`, `flipud` etc. 
- These fall under Intermediate to Advanced section of numpy.

# Iterating Arrays
- Iterating means going through elements one by one.
- As we deal with multi-dimensional arrays in numpy, we can do this using basic for loop of python.
- If we iterate on a 1-D array it will go through each element one by one.
```
import numpy as np
arr = np.array([1, 2, 3])
for x in arr:
    print(x)
```

**Note**: In a 2-D array it will go through all the rows for each loop

# Iterating 2D and 3D Array
- Iterate on each scalar element of a multi dimension array:

**Iterating 2D Array**:
```
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
for x in arr:
  for y in x:
    print(y)
```

**Iterating 3D Array**:
```
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
for x in arr:
  for y in x:
    for z in y:
      print(z)
```

# Using `nditer()`
- The function `nditer()` is a helping function that can be used from very basic to very advanced iterations. 
- It solves some basic issues which we face in iteration, lets go through it with examples.

```
import numpy as np
arr = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
for x in np.nditer(arr):
  print(x)
```

# Joining numpy array
- Joining means putting contents of two or more arrays in a single array.
- In SQL we join tables based on a key, whereas in NumPy we join arrays by axes.
- We pass a sequence of arrays that we want to join to the `concatenate()` function, along with the axis. If axis is not explicitly passed, it is taken as `0`.

```
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
print(arr)
```


# Joining 2D array
```
import numpy as np
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[10, 20], [30, 40]])
arr = np.concatenate((arr1, arr2), axis=0)
print(arr)
```

# Splitting Array
- Use `array_split()` for splitting arrays
- Pass it the array we want to split and the number of splits

```
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6])
newarr = np.array_split(arr, 3)
print(newarr)
print(newarr[0])
```
*The return value is an array containing three arrays*

# Splitting 2D Array
Use the same syntax when splitting 2-D arrays.
```
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18]])
newarr = np.array_split(arr, 3)
print(newarr)
```

# Splitting Axis - row
- you can specify which axis you want to do the split around.
- default splitting axis is by col (axis=0), you can split along row (axis=1)
```
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18]])
newarr = np.array_split(arr, 3, axis=0)
newarr2 = np.array_split(arr, 3, axis=1)
print(newarr)
print(newarr[0])
print(newarr2)
print(newarr2[0])
```

# Search
- You can search an array for a certain value, and return the ***indexes*** that get a match.
- To search an array, use the `where()` method.

**Example**: Find the indexes(position) where the value is 4
```
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 4, 4])
found = np.where(arr != 4)
print(found)
for pos in found:
    print(arr[pos])
```
**Returned value**:
- The example above will return a tuple: (array([3, 5, 6],)
- Which means that the value 4 is present at index 3, 5, and 6.


# More Search

**Find the indexes where the values are even**:
```
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
x = np.where(arr%2 == 0) # % is remainder operator
print(x)
```

**Find the indexes where the values are odd**:
```
import numpy as np
arr = np.arange(20)
x = np.where(arr%2 == 1)
print(x)
```

# Sorting Array: `sort()`
- Sorting means putting elements in an ordered sequence.
- Ordered sequence is any sequence that has an order corresponding to elements, like numeric or alphabetical, ascending or descending.

```
import numpy as np
arr = np.array([3, 2, 0, 1])
print(np.sort(arr))
```
*This method returns a copy of the array, leaving the original array unchanged. Meaning not sorting in place*

# Sorting 2D Array
If you use the sort() method on a 2-D array, both arrays will be sorted:
```
import numpy as np
arr = np.array([[3, 2, 4], [5, 0, 1]])
print(np.sort(arr))
```

# Filter
- Getting some elements out of an existing array and creating a new array out of them is called filtering.
- You filter an array using a boolean index list: `[True, False, True, False]`
- A boolean index list is a list of booleans corresponding to indexes in the array.
- If the value at an index is `True`, that element is contained in the filtered array, if the value at that index is `False`, that element is excluded from the filtered array.

**Example**:
```
import numpy as np
arr = np.arange(40, 45)
x = [True, False, False, False, True]
newarr = arr[x]
print(newarr)
```

# Create the Filter Array
- In the example above we hard-coded the True and False values, but the common use is to create a filter array based on conditions.

**Example**:
```
import numpy as np
arr = np.arange(40, 45)
filter_arr = []
for element in arr:
    if element > 42:
        filter_arr.append(True)
    else:
        filter_arr.append(False)
newarr = arr[filter_arr]
print(filter_arr)
print(newarr)
```

# A neat approach to create filter array
A short hand to create filter array: `filter_arr = arr > 42`

```
import numpy as np
arr = np.arange(40, 45)
filter_arr = arr > 42
newarr = arr[filter_arr]
print(filter_arr)
print(newarr)
```

# Exercise: 
Create a filter array that store the position of **even** number
```
import numpy as np
numbers = np.arange(0,21)
filter = numbers%2==0
filtered_number = numbers[filter]
print(filtered_number)
```

# a clean version of the above codes
```
numbers[numbers%2==0]
```

# Exercise: 
Create a filter array that store the position of **odd** number
```
import numpy as np
numbers = np.arange(0,21)
filter = ??????
filtered_number = numbers[filter]
print(filtered_number)
```

# Exercise: 
Create a filter array that store the position of **positive** number
```
import numpy as np
profits = np.array([100, -100, 200, -200, 300, -300])
filter = ?????
earning = profits[filter]
print(earning)
```

# Random
NumPy offers the `random` module to work with random numbers.

**Example - Generating a random int**:
```
from numpy import random
x = random.randint(100)
print(x)
```

**Example - Generating a random float**:
```
from numpy import random
x = random.rand()
print(x)
```

**To keep running a same cell**:
Hold `COMMAND` (Mac) / `CTRL` (Windows) + `ENTER`

# Generating Random Array
The `randint()` method takes a size parameter where you can specify the shape of an array.

**Example**:
```
from numpy import random
x=random.randint(100, size=(3))
y=random.randint(100, size=(3,4))
z=random.rand(4,5)
print(x)
print(y)
print(z)
```

# Generating random number from array
- The choice() method allows you to generate a random value based on an array of values.
- The choice() method takes an array as a parameter and randomly returns one of the values.

```
from numpy import random
x = random.choice(["A", "2", "3", "4", "5", "6", "7", "8", "9", "10", "J", "Q", "K"])
y = random.choice(["A", "2", "3", "4", "5", "6", "7", "8", "9", "10", "J", "Q", "K"], size=(3,2))
z = random.choice(["A", "2", "3", "4", "5", "6", "7", "8", "9", "10", "J", "Q", "K"], size=(3,2))

print(x)
print(y)
print(z)
```

# NumPy ufuncs
- ufuncs stands for "Universal Functions"
- They are NumPy functions that operates on the `ndarray` object.
- ufuncs are used to implement vectorization in NumPy which is way faster than iterating over elements.

**Add the elements of tow lists**:

```
x = [1, 2, 3, 4]
y = [100, 200, 300, 400]
z = []
for i, j in zip(x, y):
  z.append(i + j)
print(z)
```

# `add()` function
You can use `add()` function to achieve the same effect as above
```
x = [1, 2, 3, 4]
y = [100, 200, 300, 400]
z = np.add(x,y)
print(z)
```

# Ufunc Simple Arithmetic
You could use arithmetic operators `+` `-` `*` `/` directly between NumPy arrays

but this section discusses an extension of the same where we have functions that can take any array-like objects e.g. lists, tuples etc. and perform arithmetic conditionally.

```
a = [1, 2, 3, 4]
b = [100, 200, 300, 400]
x = np.array([1, 2, 3, 4])
y = np.array([100, 200, 300, 400])
print(a)
print(x)
print(type(a))
print(type(x))
print(a*2)
print(x*2)
print(a+b)
print(x+y)
print(x+10)
print(x*10)
print(x/10)
print(x%y)
```

# Working with String Type

numpy has functions to deal with string types

```
students = ["April", "Ben", "Cathy", "Danny", "Eve"]
s_nparr = np.array(students)
np.char.str_len(s_nparr)
s_nparr[np.char.str_len(s_nparr)==3]
```

# More on NumPy
https://numpy.org/doc/stable/user/quickstart.html#

# A little taste of Pandas
- Data analysis projects heavily rely on Pandas
- Pandas let you store and process multiple dimension data such as CSV, Excel and JSON
- Pandas is built on top of Numpy
- You will find Pandas usage syntaxes very similar to Numpy
- Pandas and Numpy are often used together

To use pandas, import pandas like below

`import pandas as pd`

To check the version of pandas

`print(pd.__version__)`

Often numpy is also imported

`import numpy as np`

## Reading CSV File

```
df = pd.read_csv("./data/graduates.csv") # df: the variable, data frame, two dimensional data
type(df) # check the type of df
```

# Pandas Data Frame

```
df.info() # show the breif information of df
df.shape # the dimension of data frame
df.head() # shows the first 6 rows
df.tail() # show the last 6 rows
df.loc[0] # retrieves the first row
df.loc[2:5] # retrieve row two to five
```

# Pandas Visualization

- Filtering rows by condition
- Producing data visualation using certain column

```
df["LevelOfStudy"].unique() # shows the uniquie values of a column
df["ProgrammeCategory"].unique() # shows the uniquie values of a column
df[df["LevelOfStudy"]=="Taught Postgraduate"] # Filtering the rows by condition
df_postgraduate_bm_male = df[(df["LevelOfStudy"]=="Taught Postgraduate") 
   & (df["ProgrammeCategory"]=="Business and Management")
   & (df["Sex"]=="M")
  ] # Flitering the rows by multiple conditions
df_postgraduate_bm_male = df_postgraduate_bm_male.set_index("AcademicYear") # setting index column for chart generation
df_postgraduate_bm_male["Headcount"].plot(kind="bar") # Plotting a bar chart
```

# Exercise
- Getting Realtime Currency Exchange Rates
- Currency Rate API: https://fixer.io/
- Register a free account

### Your Task:
Modify the the following weather data fetching script to build a live currency exchange rates mini app


```
import urllib.request, json 
url = "https://data.weather.gov.hk/weatherAPI/opendata/weather.php?dataType=fnd&lang=tc"
with urllib.request.urlopen(url) as req:
    data = json.loads(req.read().decode())
    print(data)
    print(data["generalSituation"])
    print(data["weatherForecast"])
    print(data["weatherForecast"][0])
```