# **Introduction to Python. Day 1**

## *Dr Kirils Makarovs*

## *k.makarovs@exeter.ac.uk*

## *University of Exeter Q-Step Centre*



---


# **Welcome to Day 1!**

## **The purpose of today is to get yourself ready to code in Python. This includes:**

+ Getting used to Python workflow via **Google Colaboratory** and **Jupyter Notebook**
+ Getting to know Python syntax and basic commands

<figure>
<left>
<img src=https://miro.medium.com/max/502/1*sXs3TvhjvXcVCTldKnwMpA.png  width="400">
</figure>


## **What is a Jupyter Notebook?**

The *Jupyter Notebook* is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.

Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.

Basically, running Python in *Jupyter Notebook* allows you to combine *text*, *code*, and *code output* in a single notebook that can be then saved as a PDF or HTML document.

We will run our Jupyter Notebooks via Google Colaboratory (Colab) which allows to write and execute Python code in your browser. Check out the short video below on how it all works.

You can find more information about Jupyter Notebooks [here](https://jupyter.org/), [here](https://www.dataquest.io/blog/jupyter-notebook-tutorial/), and [here](https://www.youtube.com/watch?v=2eCHD6f_phE).

Also, take a look at [this](https://colab.research.google.com/?utm_source=scs-index#) exemplary notebook to get a sense of what you can do with it!

In [None]:
from IPython.display import YouTubeVideo

YouTubeVideo('inN8seMm7UI', width = 800, height = 450)


## **How to combine text and code in one workflow?**

By using Text Cells and Code Cells!

**Code cells** is where the code is written and executed.

**Text cells** are used to describe the output of the coding and they have some flexibility in terms of the appearance. 

*Before diving into coding, let us briefly look at how one can format text in Jupyter Notebooks.*



---





# **1. Text cells in Jupyter Notebooks**

In *text cells* you can create:

# h1 Heading
## h2 Heading
### h3 Heading
#### h4 Heading

## Emphasis

**This is bold text**

__This is bold text__

*This is italic text*

_This is italic text_

~~Strikethrough~~

## Lists

Unordered

+ Create a list by starting a line with `+`, `-`, or `*`
+ Sub-lists are made by indenting 2 spaces:
 + Marker character change forces new list start:
    + Ac tristique libero volutpat at
    + Facilisis in pretium nisl aliquet
    + Nulla volutpat aliquam velit
+ Very easy!

Ordered

1. Lorem ipsum dolor sit amet
2. Consectetur adipiscing elit
3. Integer molestie lorem at massa

## Tables

| Option | Description |
| ------ | ----------- |
| data   | path to data files to supply the data that will be passed into templates. |
| engine | engine to be used for processing templates. Handlebars is the default. |
| ext    | extension to be used for dest files. |

Check out [this page](https://markdown-it.github.io/) for more!

---


# **2. Helpful small tips**

One way to make your life easier is to use keyboard shortcuts when navigating through the notebook!

Here are the most common ones:

| Command | Windows | Mac
| ------- | -------- | ---
| Run entire cell | ctrl + enter | ctrl + enter
| Run entire cell and move to the next one | shift + enter | shift + enter
| Run single line in a cell | ctrl + shift + enter | ctrl + shift + enter
| Insert code cell above | ctrl + m + a | ctrl + m + a
| Insert code cell below | ctrl + m + b | ctrl + m + b
| Switch from code to text cell | ctrl + m + m | ctrl + m + m
| Switch from text to code cell | ctrl + m + y | ctrl + m + y
| Move cell up | ctrl + m + k | ctrl + m + k
| Move cell down  | ctrl + m + j | ctrl + m + j
| Delete cell  | ctrl + m + d | ctrl + m + d

In addition to that, let me also make a distinction between running an entire *cell of code* and a *single line of code* clearer.

It is a good practice to structure your code in such a way that one cell contains a chunk of code that is devoted to one particular task. 

However, one should take into account that (unless you use print statements explicitly), one code cell will produce only one piece of output, and it is going to be related to the latest statement in the code cell.

See example below:


In [None]:
# As you can see, even though I asked to show both x and y objects,
# if I simply run a code cell, only y will be produced

x = 4

x

y = 5

y


In [None]:
# You can overocome this by using print statements, but it's not very handy in notebooks

x = 4

print(x)

y = 5

print(y)


In [None]:
# Ultimately, if you have two tasks with separate pieces of output that you want to be produced, use two different code cells

x = 4

x


In [None]:
y = 5

y


In [None]:
# However, please also note that you can highlight a single line in a code cell and run it
# by using ctrl + shift + enter shortcut

4 ** 4 # highlight this cell and run it via ctrl + shift + enter

4 ** 6 # then highlight this cell and run it in the same way

# You see that in this way you can sequentially get more than one output from a single code cell

# This is not very oftenly used when your write a proper research notebook, however the notebooks for
# this course (especially the very first session) are structured with this in mind to save space and make
# them less volumnious


### ***Now let's dive into Python!***

---

# **3. Basics of Python**



In [None]:
1 + 3 # evaluation

a = 1 + 3 # object assignment

a=4 # spacing doesn't matter

a 

b = a ** 4 # a to the power of 4

b


In [None]:
a = 15

b = 6

a == b # is a equivalent to b?

a != b # is a not equal to b?

a > b # is a greater than b?

a < b # is a smaller than b?

a >= b # is a greater than or equal to b?

a <= b # is a smaller than or equal to b?


In [None]:
# Basics of working with lists

# You can think of a list as a simple collection of objects.
# Objects can be of different types - we will talk about this later.

values = [5, 'hello', 18.1, True, 17, 'Monday']

values

values[0] # accessing the first element (note that it starts with 0!)

values[-1] # accessing the last element

values[1:3] # accessing the 2nd and the 3rd elements (note that when slicing the last element of a slice doesn't count!)

values[3:] # accessing all elements after the 4th one (including the 4th one too)


<figure>
<left>
<img src=https://static.javatpoint.com/python/images/lists-indexing-and-splitting.png width="400">
</figure>

**[Image source](https://www.analyticsvidhya.com/blog/2021/06/15-functions-you-should-know-to-master-lists-in-python/)**

In [None]:
# Let's import the numpy library

# numpy is a library that enables you to work with multi-dimensional arrays (vectors)
# it's very handy when you want to element-by-element operations

import numpy as np


In [None]:
# Note that if you want to perform any element-by-element operations (likes with vectors in R),
# you should convert a list to an array (that's the main reason we need a numpy library)

# All objects in the numpy array should be of the same type (e.g. all numbers)

my_list = [1, 2, 3]

my_list == 1 # False

my_list[0] == 1 # True

my_list == [1, 2, 3] # True

# Now converting list into an array with the np.array() command

my_list_array = np.array(my_list)

my_list_array

my_list_array[0] == 1 # True

my_list_array == 1 # True, True, True


In [None]:
# Another example of element-by-element operations with arrays

my_list = [1, 2, 3, 4]

# Look at the difference between:

# Multiplying a list by 0
my_list * 0 

# And multiplying an array by 0
np.array(my_list) * 0 

# Can you predict what is going to be an outcome of this operation?
np.array([1, 2, 3, 4]) * np.array([0, 0, 0 , 1])


In [None]:
# Some useful things you can do with lists

numbers = [1, 5, 10, 15, 0, 0, 17]

colors = ['blue', 'yellow', 'orange', 'orange', 'blue']

# You can:

numbers + colors # concatenate lists

colors * 3 # multiplicate lists

'pink' in colors # check if the element is in a list

colors[1] == 'yellow'

# Helpful functions to be performed on a list (works best with numbers)

len(numbers) # get the length of a list

min(numbers) # get the minimum value

max(numbers) # get the maximum value

sum(numbers) # get the sum of all values


## **Exercise**

Alright, we've covered a lot of material by now, so let's put it into practice to get a better grip of Python basics!

In [None]:
# Here is the list of randomly generated numbers for you

import random # import library with .sample() function

task_list = random.sample(range(0, 100), k = 25) # generate a list of 25 random elements from 0 to 100 without replacement

task_list


In [None]:
# Part 1

task_list[:10] # get the first 10 elements from the list

task_list[1:5] # get the elements from the 2th to 5th

task_list[15:] # all the elements after the 16th one

task_list[14] == task_list[24] # check whether the 15th element is the same as the 25th

task_list[3] >= task_list[22] # check whether the 4th element is greater or equals to the 23rd element

82 in task_list # check whether 82 is within the list

min(task_list) > 5 # check whether the minimum value of the list is greater than 5

len(task_list) == 25 # check that there are indeed 25 elements in the list

sum(task_list) # get the sum of all elements in the list


In [None]:
# Part 2

# Can you find how many values from the list are greater than its average value?

# To get the average value, use .mean() method

# Example:
np.array([5, 17, 19, 8]).mean() # the average is 12.25


In [None]:
# Create an object with the average value
avg = np.array(task_list).mean()

np.array(task_list) > avg # for each value in the list, test whether it is greater than the average value

# Use sum() to get a total number of True values
sum(np.array(task_list) > avg)


---

# **4. Pandas. Working with dataframes in Python**

<figure>
<left>
<img src=https://miro.medium.com/max/481/1*n_ms1q5YoHAQXXUIfeADKQ.png  width="450">
</figure>

## **Loading dataframes from external sources**

With *Pandas*, you can open datasets that are stored in a great variety of formats

(just start typeing `pd.read_` in a code cell and you'll see all the possible options)

Here is a list of most common types of data and commands to open them

| Data format | Explanation | Command
| ------- | -------- | ---
| .csv | Comma-separated values | pd.read_csv
| .xls / .xlsx | Excel spreadsheet | pd.read_excel
| .dta | STATA Data file format | pd.read_stata
| .sav | SPSS Data file format | pd.read_spss

This is how a typical dataset looks like:

<figure>
<left>
<img src=https://media.geeksforgeeks.org/wp-content/uploads/finallpandas.png width="550">
</figure>

**[Image source](https://www.geeksforgeeks.org/python-pandas-dataframe/)**




In [None]:
# Import pandas library

import pandas as pd


In [None]:
# Let's upload the mtcars dataset into the current Google Colab session

from google.colab import files

uploaded = files.upload()


In [None]:
# Here is an example of how to open a .csv dataframe in Python using Pandas library

df = pd.read_csv('mtcars.csv')

df


In [None]:
# Some helpful commands to get to know the dataset

type(df) # object type - pandas.core.frame.DataFrame

df.shape # number of rows and columns

df.columns # column names

df.index # indeces

df.info() # summary of the variables in the dataset
# Dtype column shows what type of variable it is - we'll talk about it later

df.head(10) # get the top 10 rows of the dataset

df.tail(10) # get the last 10 rows of the dataset

df.describe() # picks out only numerical variables (Dtype int64 or float64)
# and shows basic descriptive statistics


# **That's the end of Day 1!**