# Python Libaries
Numpy and Pandas are the two most widely used libraries. 

## Import - Specify what library to use
```
import numpy as np
import pandas as pd
```
To import only part of a library (e.g. to save memory)
```
from numpy import mean
...
mean(1, 2, 3)
```

# Numpy
A library of math or numeric processing. Must pay attention to whether a numpy operation is in-place or not. 

* An in-place operation meaning the original object or value is mutated. 
* An operation is not in-place if it returns a new object as the result. 

In [1]:
import numpy as np

In [3]:
numbers = [1, 2, 3, 4, 5]
type(numbers)

list

In [3]:
print ("mean: ", np.mean(numbers))
print ("average: ", np.average(numbers))

mean:  3.0
average:  3.0


## numpy's Array Is Not The Same As Python List

1. **Data Type**:
   - **Python List**: Can contain elements of different data types (e.g., integers, strings, floats).
   - **NumPy Array**: Typically contains elements of the same data type, which makes it more efficient for numerical operations.

2. **Performance**:
   - **Python List**: Generally slower for numerical computations due to its flexibility and dynamic nature.
   - **NumPy Array**: Optimized for numerical operations and can handle large datasets more efficiently.

3. **Memory Usage**:
   - **Python List**: Consumes more memory as it stores additional information for each element.
   - **NumPy Array**: Consumes less memory because it stores elements in a contiguous block of memory.

4. **Mathematical Operations**:
   - **Python List**: Cannot directly perform element-wise mathematical operations. You need to use loops or list comprehensions.
   - **NumPy Array**: Supports element-wise operations, making it easier and faster to perform mathematical computations.

5. **Functionality**:
   - **Python List**: Offers a wide range of built-in methods for general-purpose programming.
   - **NumPy Array**: Provides a rich set of functions specifically designed for numerical and scientific computations.

In summary, if you're working with numerical data and need efficient computations, **NumPy arrays** are the way to go. For general-purpose programming with mixed data types, **Python lists** are more suitable.

In [8]:
# Note the 'type' value in the output
a=np.linspace(1, 7, 5)
print ("liner space : ", a )
print ("type of a: ", type(a))

liner space :  [1.  2.5 4.  5.5 7. ]
type of a:  <class 'numpy.ndarray'>


## Convertion Between numpy Array and Python List

In [7]:
# create np array from a Python list
nums = [1,2,3]
nums_np = np.array(nums)
print("type(nums): ", type(nums))
print("type(nums_np): ", type(nums_np))

nums_list = nums_np.tolist()
print("type(nums_list): ", type(nums_list))

type(nums):  <class 'list'>
type(nums_np):  <class 'numpy.ndarray'>
type(nums_list):  <class 'list'>


## Broadcasting in `numpy`
Array broadcasting in Python, particularly in the context of the NumPy library, is a powerful mechanism that allows operations on arrays of different shapes. It enables NumPy to treat arrays with different shapes during arithmetic operations in a way that makes them compatible. Here's a detailed explanation:

>Broadcasting Rules: 
>
>When operating on two arrays, NumPy compares their shapes element-wise, starting from the last dimension and working backward. Two dimensions are compatible when:
>
>1. They are equal, or
>2. One of them is 1

_The following operations are only possible with Python and numpy. They are not correct math (Liner Algebra) expressions._

In [7]:
w = np.array([[1,2,3], [2,3,4], [3,4,5]])
x = np.array([2,3,4])
w + x

array([[3, 5, 7],
       [4, 6, 8],
       [5, 7, 9]])


### Benefits of Broadcasting
* Memory Efficiency: Broadcasting avoids making copies of data, saving memory.
* Performance: Operations with broadcasting can be vectorized, making them faster than using loops.
* Readability: Code using broadcasting is often more concise and easier to read.



### Example 1: Scalar and Array

In [None]:
## If you have a scalar value and an array, broadcasting allows the 
## scalar to be applied to each element of the array:

import numpy as np
array = np.array([1, 2, 3])
scalar = 2
result = array ** scalar
print(result)  # Output: [3 4 5]

[1 4 9]


### Example 2: Two Arrays of Different Shapes

In [None]:
## Let's say you have two arrays of different shapes:
array1 = np.array([[1, 2, 3], [4, 5, 6]])
array2 = np.array([1, 2, 3])
result = array1 + array2
print(result)

[[2 4 6]
 [5 7 9]]


Here, array1 has the shape (2, 3) and array2 has the shape (3,). Broadcasting works by stretching array2 to match the shape of array1, effectively turning array2 into:
```python
[[1, 2, 3],
 [1, 2, 3]]
```
Then element-wise addition is performed.

### Example 3: Incompatible Shapes
If the shapes are not compatible, broadcasting will raise an error.

In [5]:
array1 = np.array([[1, 2, 3], [4, 5, 6]])
array2 = np.array([1, 2])
result = array1 + array2  # This will raise a ValueError

ValueError: operands could not be broadcast together with shapes (2,3) (2,) 

# pandas
pandas is a library that gives access to data sets.

In [5]:
import pandas as pd


## DataFrame
A DataFrame is a data structure that organizes data into a 2-dimensional table of rows and columns, much like a spreadsheet. A dataframe has the following parts: 
* Each _row_ represents a record
* Each _Column_ represents a variable or _feature_ in the dataset. Columns have data types and names.
* _Index_ is used to uniquely identify a row in the table.
* _Data_ or _Cell_ holds the actual data values

In [None]:
# Creating a dataframe from an arry - each element is an arrays of 3-elements
dict = np.random.randn(4, 3)
print ("Data dictionary 'dict':")
print (dict)
df = pd.DataFrame(data = dict)
print ("Dataframe df: ")
print (df)

### Constructing DataFrame
___Important:___ Each column of a DataFrame can be arrays, but it must be one-dimensional. 

Typical python error about this limitation would be: `"ValueError: Per-column arrays must each be 1-dimensional"`

In [None]:
# create data set
var1 = np.random.randn(3,2) * 5 
var2 = np.random.randn(3,2) + 20 

# variable labels
labels = ['Temp', 'Ice cream']

# Compose a dictionary
dict = {labels[0] : var1, labels[1] : var2}
print ("dict: ", dict)

# use pandas data frame
df = pd.DataFrame(data=dict)
print("dataframe: ", df)

# pandas will try to fit 'dict' in a DataFrame by: 
#  - placing data of key 'Temp' in the first column
#  - placing data of key 'Ice cream' in to the second column
# However, both "Temp"  and "Ice cream" are linked to an 2-dimensional array - this violates the requirement of DataFrame construction.  


dict:  {'Temp': array([[1.10482479, 3.31318828],
       [1.79081594, 8.83284042],
       [2.11512438, 1.75888225]]), 'Ice cream': array([[21.4244878 , 19.19705313],
       [19.58061568, 19.12207088],
       [20.45794567, 18.57702693]])}


ValueError: Per-column arrays must each be 1-dimensional

### Exercise
Create a pandas dataframe with: 1) integers from 0 to 10, 2) their square, and 3) their log  

In [None]:
num = np.array(range(0,11))
sqr = np.square(num)
log = np.log(num)
dataframe = pd.DataFrame({
    "num": num,
    "square": sqr,
    "log": log})
dataframe

# We'll get a WARNING of "divided by zero", this is caused by the "log(0)"" compute. 
# But the entire execution will still succeed. 

  log = np.log(num)


Unnamed: 0,num,square,log
0,0,0,-inf
1,1,1,0.0
2,2,4,0.693147
3,3,9,1.098612
4,4,16,1.386294
5,5,25,1.609438
6,6,36,1.791759
7,7,49,1.94591
8,8,64,2.079442
9,9,81,2.197225


# `time`
This is another frequently used library.

`time` is included in a standard Python installation. You still need to **import** it to be able to use it. 

In [9]:
import time 

## `time.time()`
Returns the current time in seconds since the epoch (1970-01-01 UTC)

In [11]:
time.time()

1733281637.8240397

## `time.sleep()`
Suspends execution of the current thread for the given number of seconds.

In [13]:
time.sleep(3)

## `time.perf_counter()`
Returns the value of a performance counter. A performance counter is a clock with the highest (time) resolution that is meant to be used to measure short durations. 

In [18]:
# obtain a clock
tic = time.perf_counter()

# perform any operation
r=[]
for i in range(100):
    r.append(i **2)

toc = 1000 * (time.perf_counter() - tic)
print ("time lapsed: " + str(toc) + " ms")

time lapsed: 0.05239993333816528 ms


## Other time operations
* Time conversion:
    * `time.gmtime()`
    * `time.localtime()`
    * `time.mktime()`
* Time formatting - `time.strftime()`
* Time parsing - `time.strptime()`



In [21]:
time.gmtime()

time.struct_time(tm_year=2024, tm_mon=12, tm_mday=4, tm_hour=3, tm_min=20, tm_sec=10, tm_wday=2, tm_yday=339, tm_isdst=0)

In [22]:
time.localtime()

time.struct_time(tm_year=2024, tm_mon=12, tm_mday=3, tm_hour=22, tm_min=23, tm_sec=2, tm_wday=1, tm_yday=338, tm_isdst=0)

In [24]:
from time import gmtime, strftime
strftime("%a, %d %b %Y %H:%M:%S +0000", gmtime())

'Wed, 04 Dec 2024 03:23:59 +0000'