<a href="https://colab.research.google.com/github/statrliu/data-science-letures/blob/main/Introduction_to_Python_for_Data_Science_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Full Cycle of a Data Science Project**
  1. Defining the Problem to Solve 
  2. Collecting Data
  3. Manipulating Data
  4. Building, Evaluating, and Selecting Models
  5. Delivering Results 

# **Overview of Data Manipulation**
+ Cleaning the data
  + Investigating errors/inconsistency/outliers
  + Investigating missing values 
+ Formating and combining multiple data sets
+ Generating new features




# **Python for Data Science**


In [3]:
# Check Python version
import sys
sys.version

'3.8.10 (default, Nov 14 2022, 12:59:47) \n[GCC 9.4.0]'

## _Object and Type (Class)_

**Object:** An object can hold data (attributes) and perform operations (methods). For example:
+ An integer object can hold integer values and support arithmetic operations such as addition and subtraction. 
+ A string object can hold text values and support operations such as concatenation and substring extraction.

**Type/Class:** A Python type is a classification of objects based on their characteristics and the operations they support. 

***In Python, almost everything is an object.***

***task***: Find the type of an object
```
#using build in type() function
type(10)
type(1.5)
```

***task***: List an type/object's attributes (both data and method).
```
dir(10)
dir(int)
```

***task***: Access an object's data/method using `.` notation 
```
(10).real
(10).__str__() # same as str(10)
```

**Variables (Object reference)**

In Python, if you create a variable `x` and assign it a value of 25, Python creates an integer object with a value of 5 and assigns a reference to that object to the variable `x`. You can then perform operations on that object, such as adding `x` to another integer.

***task:*** Find the unique id (memory address) for an object.
```
# Using build in id() function
id(20)
id("data")
```

**Assignment Statement:** 

```
small_int = 10 # snake case
largeFloat = 1234567.89 # camel case
HighVolumn = 6789 # pascal case
```

**Naming rules:**

+ A variable name must start with a letter or the underscore character

+ A variable name cannot start with a number

+ A variable name can only contain alpha-numeric characters and underscores `(A-z, 0-9, and _ )`

+ Variable names are case-sensitive (`sd` and `SD` are different )

+ Using meaningfull names (`counter` is better than `c`)





float

## _Fundamental Data Types_

### **Integer and Float**

#### *Arithmetic Operators*

#### *Augmented Assignment Operators*


Floating numbers in computer are just approximations.
(https://docs.python.org/3/tutorial/floatingpoint.html)

```
0.1 + 0.1 + 0.1 == 0.3
format(0.1, '.20f')
``` 

When dealing with float numbers, don't use `==` for equality comparision. Use `isclose` funtion in the `math` module. 
```
import math
math.isclose(18/3, 6, rel_tol = 0.001, abs_tol = 0.001)
```

In [1]:
import math
math.isclose(18/3, 6, rel_tol = 0.001, abs_tol = 0.001)

True

### **None Type**
`None` is the only value of the `NoneType`. 
+ `NoneType` can be used for missing values.
+ If a function does not have a `return` statement, the function will automatically return `None`.

### **Bool**

Bool type has two values: `True, False` 
  + `True and False` are objects. You can use `id(True), id(False)` to check their addresses.
  + Type of `True, False` is `bool`
    ```
    type(True)
    Out: bool
    ```
  + Type conversion (casting)
    + The result of a conditional statement will be implicitly converted to `bool` type.

      ```
      name = ""
      if name: # empty string is evaluated as False.
         print(name)
      else:
         print("name is empty")

      Out: name is empty       
      ```

    + Using `bool()` for expicit conversion.   
    + Truthy and Falsey. 
      + Empty `string` is evaluated as `False` when used in a conditional statement. So it is a Falsey. Other string values are evaluated as `True`, so they are Truthy.
        ```
        bool("")
        Out: False

        bool(" ") # one space
        Out: True
        ``` 
      + For numbers like `int, float`, zero is converted to `False`, other values are Truthy.
        ```
        bool(0) # int 
        Out: False

        bool(0.0) # float 
        Out: False
        ``` 
      + Empty `tuple, list, set, dict` are Falsey      
      + `None` object is Falsey.
      + In general, dunder method `__bool__()` is called when `bool` type convertion is invoked. 
        ```
        bool(10)
        Out: True

        # It is equivalent to:
        (10).__bool__()
        Out: True
        ``` 

        + So for a custom class, we can implement `__bool__()` to tell Python how to convert a instance of our class to `bool` type.

          ```
          class FalseClass:
              def __bool__(self):
                  return False
                  
          tmp_ins = FalseClass()
          bool(tmp_ins)
          Out: False        
          ``` 


#### *Comparison Operators*

#### *Loagical Operators*

### **Collection**

#### *Sequence Type*
Sequence contains an **ordred** list of objects. All the sequence types implement `__getitem__()` method, which can be used to extract objects using index operator `[]`.

##### **String**

##### **List**
fixed length, **mutable**, heterogeneous, sequence.


***task:*** Create a list
```
# Using []
list_1 = [1, 2, 3]
    
# Using list(obj) to convert an iterable object to a list.
# need a link to the definition of iterable object.
list_1 = list((1, 2, 3))
list_2 = list(range(5))
```

***task:*** Access elements of a list
```
list_1 = [1, 2, 3, 4, 5]
# Using [] with an single index:
    
list_1[0] # 1 return a single object, not list
list_1[-2] # 4 Negative index will be converted to a positive number using length of the list plus the index. So, in this example -2 will be converted to 5+ (-2) = 3
  
list_1[10] # IndexError: list index out of range
```




We can also use `[]` with slice object

###### *Slice Object*
    
* In Python, a slice object is used to define a range of elements to extract from a sequence, such as a string, list, or tuple.
 
* A slice object is created using the built-in `slice()` function, and it takes three arguments: start, stop, and step.

```
my_list = [1,3,5,7,9,11,13]
my_slice = slice(2, 5)
my_list_slice = my_list[my_slice]
print(my_list_slice)

# You can also use shorthand notation to create a slice object:
my_list_slice = my_list[2:5]
print(my_list_slice) 
```



    


More Examples:

```
 
list_1[0:4:2] # [1, 3]
# equivalent to: 
list_1[slice(0, 4, 2)]

list_1[0:1] # [1] returns a list!

list_1[2:4] # [3, 4]  
# equivalent to:
list_1[slice(2,4)]

list_1[1:] # [2, 3, 4, 5]
# equivalent to:
list_1[slice(1, None, None)]

list_1[1::2] # [2, 4]
list_1[:3] # [1, 2, 3]
list_1[:3:2] # [1, 3]

list_1[1:-2] # <=> list_1[1:3] -> [2, 3]
list_1[3:1] # <=> list[3:1:1] -> []

## The followings are less common    
list_1[-9::1] # <=> list(0:5:1) => [1, 2, 3, 4, 5]
list_1[-9:-8:1] # <=> list(0:0:1) => []
    
list_1[10::-1] # <=> list(5:0:-1) => [5, 4, 3, 2, 1]
list_1[10:9:-1] # <=> list(5:5:-1) => []
list_1[-6::-1] # []
    
list_1[3:1:-1] # [4, 3] negative step value means move backward.
list_1[0:4:-1] # []
```

When a slice object is used to slice a sequence. Python will get effective start and stop value when a sequence is given (based on the lengh of the sequence).

      ```
      seq[i:j] 
      if i > len(seq) -> len(seq);  if j > len(seq) -> len(seq)
      if i < 0 -> max(0, len(seq) + i);  if j < 0 -> max(0, len(seq) + j)
  
      if i ommited or None -> 0
      if j ommited or None -> len(seq)
      ```
  
      ```
      seq[i:j:k] with k < 0 
      if i >= len(seq) -> len(seq) - 1;  if j >= len(seq) -> len(seq) - 1
      if i < 0 -> max(-1, len(seq) + i);  if j < 0 -> max(-1, len(seq) + j)
  
      if i ommited or None -> len(seq) - 1
      if j ommited or None -> -1
      ```



###### *Operations on one list*
  
***task:*** Add one element to a list
+ Append an element
  ```
  list_1 = [1, 2, 3, 4, 5]
  list_1.append(6) 
  # in-place method, return None. 
  # Append 6 to the end of the sequence 
  
  list_1 # [1, 2, 3, 4, 5, 6]
  ```
+ Insert an element  
  ```  
  list_1 = [1, 2, 3, 4, 5]
  list_1.insert(1, 'new') 
  # in-place method, return None. 
  # Insert a element at the given index.

  list_1 # [1, 'new', 2, 3, 4, 5]

  list_1.insert(10, 'new') 
  # [1, 'new', 2, 3, 4, 5, 'new']
  # if index is outof bound on the right hand side, then append the element to the end.

  list_1.insert(-1, 'aa') 
  # [1, 'new', 2, 3, 4, 5, 'aa', 'new'] 
  # negative index will be converted to a positive index as usual.

  list_1.insert(-len(list_1), 'test') # ['test', 1, 'new', 2, 3, 4, 5, 'aa', 'new']

  list_1 = [1, 2, 3, 4, 5]
  list_1.insert(-6, "test") 
  # ['test', 1, 2, 3, 4, 5] if index is outof bound on the left hand side, then insert the element at the beginning of the list.    
  ```




###### *Operations on two or more lists*

##### **Tuple**

Important characteristics:
  * Fixed length, **immutable** sequence.
  * Elements can be of different types.
  
***tasks:*** Creat a tuple
  + `tup = 1, 2, 3` # called tuple packing
  + `tup = (1, 2, 3)`
  + `tup = tuple([4, 0, 2])` 
    * using tuple constructor (`tuple()`) to convert any sequence type of iterator to a tuple.
    * `tup = tuple('str') # ('s', 't', 'r')`. string type is a sequence type.
    
  + `tup = (5, )` you need this format to create a tuple with single element.
  + each member of the tuple could be of different types. `(1, 4.5, 'hello')`  

***tasks:*** Access one or more elements in a tuple

Using `[]`, Python will call `__getitem__()` method of the object.

  ```{}
  tup = tuple('hello')
    
  # Using [] with a single index. index starts from 0 not 1. 
  tup[1] # 'e'
  tup[10] # IndexError: tuple index out of range
  tup[-1] # 'o'. Negative index will be converted to a positive number using lenght of the tuple plus the index. So, in this example -1 will be converted to 5+ (-1) = 4
    
  tup[2, 4] # TypeError: tuple indices must be integers or slices, not tuple
    
  # Using slice object
  tup[slice(start = 0, stop = 3)] # ('h', 'e', 'l')
    
  tup[slice(start = 0, stop = 5, step = 2)] # ('h', 'l', 'o')
    
  tup[0:5:2] # same as above ('h', 'l', 'o')
  ```

You can not modify a tuple in the way of adding, reassigning or deleting a element of a tuple, as it is **immutable**.

***know***
```
tup[1] = 't' # TypeError: 'tuple' object does not support item assignment
del tup[1] # TypeError: 'tuple' object doesn't support item deletion
```
  
**But**, if one element of a tuple is mutable, you can modify the element in-place.
      
```
tup = (10, [3,4,5], 'end')
tup[1].append(6)
tup # (10, [3, 4, 5, 6], 'end')
```

###### *Operations on one or more tuples*

***task:*** Length of the tuple
```
tup = (1,2,3)
len(tup) # 3 or tup.__len__()
```

***task:*** Check whether an object is in the given tuple
```
## Membership or contains
tup = (1, 2, 3)
1 in tup # True
2 not in tup # False
```

***task:*** Find the index value of a given object in the tuple (or subset of the tuple).
```
## Using index() method.
## s.index(x[, i[, j]]) index of the first 
## occurrence of x in tuple s (at or after index i and before index j)

tup = (4, 5, 6, 5, 7, 8, 7)
tup.index(5) # 1
tup.index(20) # ValueError: tuple.index(x): x not in tuple
tup.index(5, 2) # 3
tup.index(7, , 5) # SyntaxError: invalid syntax
```  

In [5]:
tup = (4, 5, 6, 5, 7, 8, 7)
tup.index(5)

1

***task:*** Count frequency of elements in a tuple
```
## Using count method
a = (1, 2, 2, 2, 3, 4, 2)
a.count(2) # 4
a.count(20) # 0
```
    

***tasks:*** Replicate a tuple multiple times
```
## Using * operator
('hello', 'world') * 3 # ('hello', 'world', 'hello', 'world', 'hello', 'world')


```

Note that the objects themselves are not copied, only the references to them.
```
tup = ([1,2], [3,4]) 
tup_2 = tup * 2
print(tup_2) # ([1, 2], [3, 4], [1, 2], [3, 4])

tup[0].append(10)
print(tup_2) # ([1, 2, 10], [3, 4], [1, 2, 10], [3, 4]) 
  
tup_2[0].append(15)
print(tup_2) 
# ([1, 2, 10, 15], [3, 4], [1, 2, 10, 15], [3, 4]) -> the first and the third element share same memory address.
  
print(tup) # ([1, 2, 10, 15], [3, 4])
```
  
    

In [7]:
tup = ([1,2], [3,4]) 
tup_2 = tup * 2
print(tup_2) # ([1, 2], [3, 4], [1, 2], [3, 4])

tup[0].append(10)
print(tup_2) # ([1, 2, 10], [3, 4], [1, 2, 10], [3, 4]) 
  
tup_2[0].append(15)
print(tup_2) 
# ([1, 2, 10, 15], [3, 4], [1, 2, 10, 15], [3, 4]) -> the first and the third element share same memory address.
  
print(tup) # ([1, 2, 10, 15], [3, 4])

([1, 2], [3, 4], [1, 2], [3, 4])
([1, 2, 10], [3, 4], [1, 2, 10], [3, 4])
([1, 2, 10, 15], [3, 4], [1, 2, 10, 15], [3, 4])
([1, 2, 10, 15], [3, 4])


***tasks:*** Concatenating two or more tuples using 
```
## Using + operator
(1, 2, 3) + (4, 5) # (1, 2, 3, 4, 5)
``` 

###### *Tuple Unpacking*   
  
***task:*** Multiple assignments  
```
## assign a tuple to several variables
tup = (1,2,3)
a, b, c = tup # equivalent to a,b,c = 1,2,3
a #1
    
d, e = tup # ValueError: too many values to unpack (expected 2)
    
tup = (1, 2, (3,4))
a,b, (c, d) = tup
d #4
```

***task:*** Swap values of two variable   
```
## swap values of two variable using upacking
a, b = 1, 2
b, a = a, b # righ hand evaluaed first, than assign to the left hand.
a, b # (2, 1)
```  
 
  



Tuple unpakcing when * operator is on the left hand of `=`
```
values = 1,2,3,4,5
a, b, *rest = values
a, b # (1, 2)
rest # [3, 4, 5] it is a list not a tuple!!
    
a, *rest, b = values
a #1
b #5
rest # [2, 3, 4]
```
  
Tuple unpakcing when * operator is on the right hand of `=`  
```
tup = (1,2)
tup_s = (*tup)
# SyntaxError: can't use starred expression here
# but we can use the following
tup_s = [*tup]
tup_s # [1, 2]
# or
tup_s = (*tup, *tup)
tup_s # (1, 2, 1, 2) 
```    
  
  

#### *Mapping Type (Association Array)*
A collection of keys and associated values. (order doesn't matter)

##### **Dictionary**

dfasdfsadf

##### **Set**

## *Flow Control*



### **If-Elif-Else Statements**
In Python, the if (elif, else) statements are used for conditional execution of code. It is used to make decisions based on the truth value of a condition.

```
if condition:
    # code to be executed if condition is True
elif condition2:
    # code to be executed if condition2 is True and condition is False
else:
    # code to be executed if both condition and condition2 are False

```

Example:
```
x = -2

if x > 0:
    print("x is positive")
elif x < 0:
    print("x is negative")
else:
    print("x is zero")

```



In [1]:
x = -2

if x > 0:
    print("x is positive")
elif x < 0:
    print("x is negative")
else:
    print("x is zero")

x is negative


### **For Loop**

#### *Break and Continue*

#### *Finally Block*

#### *Iterable and Iterator in Python*

### While Loop


## *Functions*

### **Define a Function**


#### *Parameter vs Argument*

#### *Matching Argments to Parameters*

#### *Functions are first-class Object*



## *Comprehensions in Python*


### **List Comprehension**

### **Dictionary Comprehension**

### **Generator Expression**

## *Importing Modules*