# Introduction to Python & Jupyter
### <font color=indigo>A Half-Day Python Seminar</font>

# Part 1: Notebooks

## What is a notebook?
* combines
    * formula
        * python
    * values
    * documentation
        * graphs
        * commentary
* notebooks are lab journals
    * thought process
    

## Eg.,

##### Import the iris flowers dataset

In [2]:
import seaborn as sns

flowers = sns.load_dataset('iris')

##### How many flowers are there?

In [4]:
flowers['species'].value_counts()

virginica     50
setosa        50
versicolor    50
Name: species, dtype: int64

##### How do they differ in their `petal_length` ?

In [6]:
flowers['petal_length'].groupby(flowers['species']).mean()

species
setosa        1.462
versicolor    4.260
virginica     5.552
Name: petal_length, dtype: float64

##### Three types of flower -- how do they differ?

In [5]:
flowers.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB


##### How do they differ in their `petal_length` ?

In [6]:
flowers['petal_length'].groupby(flowers['species']).mean()

species
setosa        1.462
versicolor    4.260
virginica     5.552
Name: petal_length, dtype: float64

...and so on...

## Why would I prefer to use a notebook over eg., Excel?

* python
* commentary/thoughtprocess

# Part 2: About Python

## What is Python?

* programmig language
    * a language for human beings
    * which is converted into instructions
    * machines can follows
* two type of command
    * calculations
        * compute using data
        * variables to hold data
        * operations to work on data
    * actions
        * change devices
        * keyboards, screens, ...
        * decisions
        * reading files, writing to the screen...
* syntax 
    * grammer, symbols
    * compromise between human-readable
    * machine-translatable
    * case-sensitive
    * all symbols matter...

## Why Python?
* early 90s
* heavy educational focus
    * make the language read simply
    * (this seems to have been abandoned)
* some use in academia in 90s
* this lead to research/numerical toosl being developed for python
* primarily a software engineering language
    * not a language for data processing

It looks quite pleasant when handling software engineering,

In [23]:
age = 18

if age >= 18:
    print("ALLOWED")
else:
    print("DENIED")

ALLOWED


"Software Engineering" c., controlling devices (networking, user input, screens, ...)

Data Scientists care more about calculations, and python looks less clear here,

In [29]:
X = np.random.normal(0, 1, size=(4, 4))

(X[:, 0] - X[:, -1]).mean()

0.15232289410704708

In [206]:
X.mean()

-0.19298839193667217

```
X = Matrix(4 by 4) of NormalRandomNumbers()
SUM(FirstColum of X - LastColumn of X)
```

Data Scientist tools were tacked-on by academics without an educational focus.

The primary industrial users of python are all software engineering companies (google, etc.); in these areas it's useful to have your data language be the same as your software eng. lang.

This allows your data people to share code with your software people.

In [30]:
import shared

In [32]:
table = shared.read_dataset()

In [33]:
table.mean()

total_bill    20.3905
tip            3.2680
size           2.5900
dtype: float64

# Part 3: Introduction to Python

## How do I define a simple piece of data?

By default every piece of data is "singular",

`int`,

In [35]:
5

5

`float`,

In [41]:
4.3

4.3

...digital machines cannot store partial numbers precisely.

`str`, text,

In [36]:
"Hello"

'Hello'

`bool`, 

In [38]:
True

True

## How does python know the type?

The syntax determines type,

In [69]:
type(5)

int

In [68]:
type(5.0)

float

In [70]:
type('5')

str

In [71]:
type(True)

bool

## Why are data types important?

Operations are specialized to data type,

In [43]:
'5' + '5'

'55'

In [44]:
5 + 5

10

Floats are less precise than ints,

In [114]:
(0.3 ** 3)

0.026999999999999996

In [116]:
(3 ** 3) / (10 ** 3)

0.027

## How do I store data in memory?

In [42]:
a = 5 # variable

In [46]:
my_name = "Michael"

In [47]:
my_name

'Michael'

...this does not store on the disk, ie., it's not saving. 

## How do I define datasets?

A `list` is a mutable (editable), ordered, collection of data, (aka a column)

In [48]:
events = [
    "LOAN_APPLICATION_1", # comma = element
    "LOAN_APPLICATION_2",
    "LOAN_APPLICATION_3",
]

In [49]:
events

['LOAN_APPLICATION_1', 'LOAN_APPLICATION_2', 'LOAN_APPLICATION_3']

The `len` of a collection is the number of elements it contains,

In [50]:
len(events)

3

A `tuple` is a immutable (non-editable) ordered collection, (aka a row)

In [51]:
application = (
    "Michael",
    1_000,
    "6 months",
    "London"
)

In [52]:
len(application)

4

In [54]:
events.append("LOAN_APPLICATION_4") # modifies events

In [55]:
events

['LOAN_APPLICATION_1',
 'LOAN_APPLICATION_2',
 'LOAN_APPLICATION_3',
 'LOAN_APPLICATION_4']

In [56]:
application.append("SW1 1AA")

AttributeError: 'tuple' object has no attribute 'append'

The last major type of data collection is a `dict` (dictionary). This is a non-tabular data structure.

In [57]:
d = {
    "name": "michael", # key : value 
    "address" : {
        "street": "Old Street",
        "city": "London"
    }
}

In [58]:
d

{'name': 'michael', 'address': {'street': 'Old Street', 'city': 'London'}}

In [59]:
len(d)

2

## How do I access elements of these data structures?

Ordered collections in python, such as lists and tuples, are accessed *by position*,

In [117]:
events

['LOAN_APPLICATION_1',
 'LOAN_APPLICATION_2',
 'LOAN_APPLICATION_3',
 'LOAN_APPLICATION_4']

In [119]:
events[0] # the first event

'LOAN_APPLICATION_1'

The syntax to access an element in a collection uses `[]`, which are also used to define a list,

In [140]:
         # 0, 1, 2
prices = [2.2, 4.1, 6.5] # comma separated

prices[2] # variable_name[  number  ] <- FIND

6.5

In [125]:
application[0]

'Michael'

Lists and tuples can be indexed by position *forwards* from `0`, or *backwards* from `-1`,

In [126]:
prices[-1]

6.5

In [127]:
prices[-2]

4.1

With a dictionary, you access elements by name,

In [128]:
d

{'name': 'michael', 'address': {'street': 'Old Street', 'city': 'London'}}

`{key : value, ...}`

In [129]:
d['name'] # we use the key to access the value

'michael'

In [130]:
d['address']

{'street': 'Old Street', 'city': 'London'}

In [131]:
d['address']['street'] # indexes can be "sequenced"

'Old Street'

Lists can contain dictionaries,

In [132]:
dataset = [
    {'from': 'Alice', 'to': 'Eve', 'msg': 'Hi'},
    {'from': 'Eve', 'to': 'Alice', 'msg': 'Hi'},
    {'from': 'Alice', 'to': 'Bob', 'msg': 'Bye'},
]

In [136]:
dataset

[{'from': 'Alice', 'to': 'Eve', 'msg': 'Hi'},
 {'from': 'Eve', 'to': 'Alice', 'msg': 'Hi'},
 {'from': 'Alice', 'to': 'Bob', 'msg': 'Bye'}]

In [137]:
dataset[0]

{'from': 'Alice', 'to': 'Eve', 'msg': 'Hi'}

In [135]:
dataset[0]['from']

'Alice'

## Exercise ( 25 min )

Imagine you are working for a local bank which presently uses a paper system to record loan applications. 

This exercise to sketch a potential set of data structures to describe a loan application and a set of loan applications.

1. Start simple. A loan application will be a tuple of five elements which contain:
    * name, age, location
    * other relevant fields (come up with these)
2. Show this application has five elements using `len`, also show:
    * the name, the location, *and the last entry*
3. To sketch a set of loan applications we will use a list. 
    1. create an empty list and save it a variable
    2. use `.append` to add your application defined above
    3. keep `.append`ing some applications to your list
4. Use `len` to show how many applications there have been. And show the last application. 
5. Imagine for one of your loan applicants you have `credit_file`. 
    1. Define a dictionary and save it to a variable called `credit_file`. 
    2. This dictionary should have the keys:
        * name, income, debts, ...
6. Display the `credit_file`
7. EXTRA:
    * Revise your dictionary above so it contains a `LoanHistory` key, 
        * a list of loans
    * and any other interesting "depth" you want to add. 

In [146]:
application = ("Michael", 32, "London", 0, 10_000)

apps = []
apps.append(application)
apps.append(application)
apps.append(application)
apps.append(("Aliuce", 32, "London", 0, 1_000))

In [147]:
len(application)

5

In [148]:
len(apps)

4

In [149]:
apps[0]

('Michael', 32, 'London', 0, 10000)

In [150]:
apps[-1]

('Aliuce', 32, 'London', 0, 1000)

In [151]:
credit_file = {
    "name": "Alice",
    "income": 50_000,
    "debts": 5_000,
    "history": [{"amount": 5_000, "status": "DF"}, {"amount": 2_000, "status": "DF"}, ]
}

In [155]:
credit_file["history"][0]["status"]

'DF'

In [156]:
"That's Great!"

"That's Great!"

In [157]:
'Ghandi said, "be the change you wish to see in the world!"'

'Ghandi said, "be the change you wish to see in the world!"'

# Part 4: Processing Data in (Base) Python

## How do I make decisions with data?

In [176]:
debt_ratio = credit_file['debts'] / credit_file['income']
last_loan_status = credit_file['history'][-1]['status']

In [180]:
debt_ratio, last_loan_status

(0.1, 'DF')

In [183]:
answer = None # None means "missing"

if debt_ratio > 0.15:
    answer = "NO"
elif last_loan_status == 'DF':
    answer = "NO"
else:
    answer = "YES"

In [184]:
answer

'NO'

In [185]:
if (debt_ratio > 0.15) or (last_loan_status == 'DF'):
    answer = "NO"
else:
    answer = "YES"

In [186]:
answer

'NO'

In [196]:
if (debt_ratio > 0.15) or (last_loan_status == 'DF'):
    answer = "NO"
elif (credit_file["income"] < 100_000) and (credit_file["debts"] > 1_000):
    answer = "MAYBE"
else:
    answer = "YES"

In [197]:
answer

'NO'

## How do I process datasets?

....

In [198]:
numbers = [1, 2, 3, 4]

In [202]:
total = 0

n = numbers[0]
total += n

n = numbers[1]
total += n

n = numbers[2]
total += n

n = numbers[3]
total += n

total

10

In [203]:
total = 0


for n in numbers:  # REPEAT, FROM n = numbers[0] ... n = numbers[-1]
    total += n
    
total

10

The purpose of a loop is to *control* and *make explicit* how repetition of processes occur, 

In [204]:
for n in reversed(numbers):
    print(n)

4
3
2
1


## How do I use predefined procedures to output data?

## How do I use predefined procedures to process data?

## How do I change between types?

You can use the name of a type to convert to it, 

In [97]:
answer = '5'

int(answer) + 5

10

In [100]:
app = list(application)
app.append('SW1 1AA')

In [101]:
app

['Michael', 1000, '6 months', 'London', 'SW1 1AA']

## Aside:

In [102]:
datapoints = []
datapoints.append(1)
datapoints.append(1)
datapoints.append(1)
datapoints.append(1)
datapoints.append(1)

In [104]:
locked = tuple(datapoints)

In [106]:
locked.append(2)

AttributeError: 'tuple' object has no attribute 'append'

---

## Questions

## Is Jupyter Limited?

... only by how you write the program,

In [11]:
import numpy as np

In [20]:
numbers = np.random.normal(0, 1, 100_000_000)

In [21]:
numbers.mean()

-8.401655440936593e-05

In [22]:
numbers.std()

0.9999850368594061