# DataTipsy Introduction to Python

Python is a very versatile programming language. It can be used for multiple purposes, including data analysis, webscraping, managing manual tasks, sending email, web development, and even building a simple game.

Before we can even start coding, there are multiple things you need to do in order to properly get setup.

## Get to Know Command Line 
On Mac, you can use Spotlight Search to search for `Terminal` or `iTerm`, and open it. You'll see a cool black window that makes you feel like a hacker in a Hollywood movie. You will need to use this to open up an Jupyter Notebook and also install other tools such as SDK and Python libraries.

You can try typing `ls` (stands for list directory) then enter, you'll see files and folder in your root directory, including `Desktop`. Now try `cd Desktop`, where `cd` stands for change directory, followed by a folder name in your current directory. This is just one small example of command line.

We will use command line to install PIP and python libraries.

## Getting Ready to Code

The following is the summary of what we need to do before coding.

### 1. Download Python
Offical download source: https://www.python.org/downloads/

### 2. Install PIP
Official installation document: https://pip.pypa.io/en/stable/installation/

- Download the script, from https://bootstrap.pypa.io/get-pip.py.
- Open a terminal/command prompt, `cd` to the folder containing the `get-pip.py` file and run:

For Linux and MacOS:
```
python get-pip.py
```

For Window:
```
py get-pip.py
```

Then run this to test if you have installed it successfully:
```
pip --version
```
You should be able to see the version

### 3. Install Jupyter Notebook

```
pip install jupyterlab
```

### 4. Open Jupyter Notebook
Now you can create a new project folder either manually or using `mkdir`.

Here is a sample command line to create a folder called `my_folder`
```
mkdir my_folder
```

Then `cd` into your folder

```
cd my_folder
```

And open Jupyter Notebook
```
jupyter notebook
```
Now you should be able see a page where you can create folder.

### Working with Jupyter Notebook

The tool we're using here is called a Jupyter notebook (.ipynb), or a notebook. Each block is called a cell. It can be a markdown cell (for text) or code cell. This cell is a Markdown cell.

You can run your code in a cell-by-cell manner which is great for experiment which requires lots of trial and error. If you ues a python script (.py), you will need to run the whole file.

Note that a notebook can be exported into HTML file just to quickly view the content but we can't edit the notebook. If you want to edit the content in the notebook, you need to open the .ipynb file with Jupyter notebook through a command line.

In [1]:
# This is a Code cell
# We can use sharpe sign in front of a line of code to make a comment. The program will ignore it.
# You can run this cell and it will show nothing because these 3 lines are all comments.

Now let's use the commad `print` to print the text `Hello World`

In [2]:
print("Hello World")

Hello World


You can also run the below cell and see Hello World printed, but this is a notebook function only. If you're using a python script (.py) the string `Hello World` won't be printed. So it's better that we use `print`.

In [3]:
"Hello world1"
"Hello World2"

'Hello World2'

Also, you can observe that, with the built-in notebook function, it will print the last line of the code in the cell. Other lines will not be printed. To do so, you need multple `print`.

In [4]:
print("Hellow world1")
print("Hellow world2")

Hellow world1
Hellow world2


## Data Structure

There are many data structure in Python, these are some of the common ones:
- string
- float
- int
- boolean
- list
- dict
- set

In [5]:
"This a sample string"

'This a sample string'

In [6]:
# float
3.14

3.14

In [7]:
# int (integer)
2

2

In [8]:
# boolean (True of False)
print(True)
print(False)

True
False


In [9]:
# list
print(["element1", "element2"])

print([1,2,3])

['element1', 'element2']
[1, 2, 3]


In [10]:
# dict (dictionary)
dict({"key1": 1, 
      "key2": "value2"})

{'key1': 1, 'key2': 'value2'}

In [11]:
# set
print(set([1,1,2,2,2,3]))

print({1,2,3})

{1, 2, 3}
{1, 2, 3}


## Value Assignment

We can assign string, integer, float, and any data structure in a variable

In [12]:
my_text = "This is my string"

my_int = 9

my_float = 3.14

my_boolean = True

my_list = ["text1", "text2", 9]

my_contact = {"Mr.A": "099-999-7777", 
              "Mr.B": "089-999-8989"}

In [13]:
# now we can conveniently use the variable
print(my_contact)

{'Mr.A': '099-999-7777', 'Mr.B': '089-999-8989'}


In [14]:
# check type
type(my_contact)

dict

In [15]:
# calculation
my_int + my_float

12.14

## Int and Float

In [16]:
a = 10
b = 13.1
c = 2

a + b

23.1

In [17]:
a - b

-3.0999999999999996

In [18]:
a * c

20

In [19]:
# exponential (to the power of n)
a ** c

100

In [20]:
# division
a / c

5.0

In [21]:
# modulo (getting the remainer after division)
a % c

0

## String / Regex

In [22]:
sample_string = "This seems to be a very long string"

In [23]:
# retriving a string by index
sample_string[0]

'T'

In [24]:
# string slicing (from the first index for 2 characters)
sample_string[:2]

'Th'

In [25]:
# split a string into list
# by default, this will split by a blank space
sample_string.split()

['This', 'seems', 'to', 'be', 'a', 'very', 'long', 'string']

In [26]:
# split by a specific character or word
sample_string.split("very long")

['This seems to be a ', ' string']

In [27]:
# lower
sample_string.lower()

'this seems to be a very long string'

In [28]:
# upper 
sample_string.upper()

'THIS SEEMS TO BE A VERY LONG STRING'

In [29]:
# don't forget to assign into a new variable otherwise the result won't be saved into the orginal variable
# you can see here if we print the variable sample_string, it is strill the same
sample_string

'This seems to be a very long string'

In [30]:
# new variable assignment
new_sample_string = sample_string.upper()

print(new_sample_string)

THIS SEEMS TO BE A VERY LONG STRING


In [31]:
# replace a variable in a string
description = "super cool"

f"This is a {description} string"

'This is a super cool string'

In [32]:
# another method for string replacement
"This is a {} string".format("short")

'This is a short string'

### Edit Distance / String Mapping

In [33]:
# to mentioned in class

## List

In [34]:
list_of_string = ["string-1", "string-2"]
list_of_float = [3.14, 1.61]
list_of_mixed = ["string-1", 3.14, dict({"a": "b"})]

In [35]:
# retriving an element of a list by index
print(list_of_string[0])
print(list_of_string[1])

string-1
string-2


In [36]:
# add to a list
list_of_float.append(2.71)

print(list_of_float)

[3.14, 1.61, 2.71]


In [37]:
# remove from a list
list_of_float.remove(2.71)

print(list_of_float)

[3.14, 1.61]


In [38]:
# count a number of element in list
len(list_of_float)

2

## Set

You can think of set as a non-duplicable version of a list, and also without order. Similar to the concept of set in mathematics, we can do intersection and union between two or more sets.

In [39]:
# sample set
{1,2,3}

{1, 2, 3}

In [40]:
# convert a list into set
set([1,1,2,3])

{1, 2, 3}

In [41]:
# intersection of two sets
A = {1,2,3}
B = {2,3,4}
C = {1,4}

A.intersection(B)

{2, 3}

In [42]:
# union
A.union(B)

{1, 2, 3, 4}

In [43]:
# subtraction
A - B

{1}

In [44]:
B - A

{4}

In [45]:
# count element in set
len(A)

3

In [46]:
# dealing with more than 2 sets
A = {"a","b","c"}
B = {"a","c"}
C = {"b","c"}

set.intersection(*[A,B,C])

{'c'}

In [47]:
set.union(*[A,B,C])

{'a', 'b', 'c'}

Set is used a lot in data mapping step, imagine you have 3 tables to be joined together using one column, you want to check how many percent that the value actually match. So you can use intersection of all sets, and each pair of set to find more matching details.

## Dict
Dictionary is organized as key-value pairs

In [48]:
sample_dict = {"key1": "value1", "key2": "value2"}

sample_dict

{'key1': 'value1', 'key2': 'value2'}

In [49]:
# retrive a value by key
sample_dict["key1"]

'value1'

In [50]:
# create a new key-value pair in an existing dictionary
sample_dict["key3"] = "value3"

sample_dict

{'key1': 'value1', 'key2': 'value2', 'key3': 'value3'}

In [51]:
# count the number of key-value pairs
len(sample_dict)

3

In [52]:
# be creative when using dictionary, it doesn't have to be string
mixed_dict = {
    "customer_id": "c123",
    "most_favorite_products": ["apple", "mango", "strawberries"],
    "RFM": {
        "recency": 0.8,
        "frequency": 0.2,
        "monetary": 0.3
    }
}

# we can use display function as an alternative to print as well
# it's prettier especially for dict and dataframe
display(mixed_dict)

{'customer_id': 'c123',
 'most_favorite_products': ['apple', 'mango', 'strawberries'],
 'RFM': {'recency': 0.8, 'frequency': 0.2, 'monetary': 0.3}}

### JSON / YAML

In [53]:
{
'apikey': 'dfdsafkdl;jsjl',
'secret': 'sdhjfkdshafjklh'
}

{'apikey': 'dfdsafkdl;jsjl', 'secret': 'sdhjfkdshafjklh'}

## Git

In [54]:
# to show overview of how it works

## Docker

In [55]:
# to mentioned its functionality in class

## Procedures

### IF-ELSE

In [56]:
x = 10

# if-else
if x > 10:
    print("x is more than 10")
elif x > 5:
    print("x is more than 5 but less than 10")
else:
    print("x is less than or equal to 5")

x is more than 5 but less than 10


### For Loop

In [57]:
small_list = [1,2,3,4]

for i in small_list:
    print(i)

1
2
3
4


In [58]:
# for loop over a dictionary with show the key
small_dict = {"a": 1.1, "b": 2.2, "c": 3.3}

for i in small_dict:
    print(i)

a
b
c


In [59]:
# if you want the value of the dict, you need to do as follows
# note that we can replace i with key or anything, we just need to be consistent within the loop

for key in small_dict:
    print(small_dict[key])

1.1
2.2
3.3


In [60]:
# a mix of for loop and if-else
threshold = 2

for key in small_dict:
    
    if small_dict[key] < threshold:
        print(f"The value of the key {key} is LESS than {threshold}")
    else:
        print(f"The value of the key {key} is MORE than or equal to {threshold}")

The value of the key a is LESS than 2
The value of the key b is MORE than or equal to 2
The value of the key c is MORE than or equal to 2


## Seaborn

In [61]:
# to show sample is class

There are other procedure such as while loop, try-except, and more. But let's start from these if-else and for loop first.

## Mini Problem Solving Practice
In the real world, we will need to decide how to keep your data structure. For example, if you are told to calculate GPA, you need to think about how to keep the grade for each subject, and for each student. There is no right or wrong answer, you can give it a try and change it later. 

The more you practice, the better you become at guessing the data structure that pave your way to a cleaner code.

1. My favorite movies

Keep two movies in a list, then print "My favorite movies are ___ and ___"

2. GPA Calculation