# Part 1- Introduction to Python

This tutorial assumes you have some experience with programming langauges and ideally have a little bit of knowledge of python. Nonetheless we will starting by revising the basic constructs of the python language.

![Python Logo](https://www.python.org/static/community_logos/python-logo-master-v3-TM.png)

There are a lot of excellent resources on the internet for learning python. This short series of tutorials condenses the most important parts for data analysis. For a more in depth understanding of python, exploring these exellent tutorials is recommended:

* Software Carpentry https://swcarpentry.github.io/python-novice-inflammation/index.html
* DataCamp https://www.datacamp.com/courses/intro-to-python-for-data-science


Python is an interpreted langauge, which means that you can run code interactively, unlike a compiled language (e.g. C++, Julia, Fortran) where you have to build your application into machine code and then run it.Your code runs in an interpreter environment, which inprets the code into instructions for the machine you are running on. This means that most of the time, your python code can run on many different environments so long a  python inerpreter is installed.

Another key concept of the python language is whitespace (that is space characters), which are used to divide your program into blocks of code in various ways. Unlike other languages where a character like a semicolon (;) denotes the end of a line, here the indentation of the code denotes the betginning and end of blocks of code.

We will now go through the basic language block constructs and their syntax. To start with lets look at some basic statements.

Assign the result of some basic Maths to a variable

In [1]:
a = 3
b = 2
a_times_b = a * b
a_squared_plus_b = a ** 2 + b

In [2]:
threshold = 5 
result_above_threshold = a_times_b > threshold
result_above_threshold

True

### Strings

Python handles strings very easily. 

In [3]:
print('this is a string')

this is a string


A very useful command "format" allows you to easily mix in results you have calculated with your string. This shows 2 ways to call this command. You will the two resulting string are identical.

In [4]:
s1 = 'The results of a multipled by b is {result}'.format(result=a_times_b)
s2 = f'The results of a multipled by b is {a_times_b}'
print(s1)
print(s2)

The results of a multipled by b is 6
The results of a multipled by b is 6


### Conditional statements
Conditional statements are the part of a programming language that allow you to different things based on a certain condition e.g. the value of a variable.

In [5]:
if a_times_b > threshold:
    print(f'The result of {a_times_b} is greater than the threshold of {threshold}')
else:
    print(f'The result of {a_times_b} is less than the threshold of {threshold}')


The result of 6 is greater than the threshold of 5


### Repitition
An important part of any part of a programming language is being able do the same action multiple times, usually do the same action to each in a collection of items. In python this is done with a for or while loop.

In [6]:
for index1 in range(5):
    print(f'Doing job on item {index1}')

Doing job on item 0
Doing job on item 1
Doing job on item 2
Doing job on item 3
Doing job on item 4


In [7]:
index2 = 0
while index2 < threshold:
    print(f'Doing job on item {index2}')
    index2 += 1

Doing job on item 0
Doing job on item 1
Doing job on item 2
Doing job on item 3
Doing job on item 4


### Reusing code - functions

We often find we need to perform the same action many times but as part of different workflows so a loop is not suitable. To use code repeatedly without just cutting and pasting, we define a function as follows, then call the function. We will demomstrate this by defining a function to calculate the area of triangle using Heron's Formula:
https://www.mathopenref.com/heronsformula.html

In [8]:
def triangle_perimeter(a,b,c):
    """
    This function calculates the perimeter of a triangle, given by the lengths of the 3 sides a,b,c.
    """
    return a+b+c    

In [9]:
def triangle_area(a,b,c):
    """
    This function calculates the area of a triangle, given by the lengths of the 3 sides a,b,c.
    """
    semiperimeter = 0.5 * triangle_perimeter(a,b,c)
    area = (semiperimeter * (semiperimeter - a) * (semiperimeter - b) * (semiperimeter - c)) ** 0.5
    return area
    

In [10]:
triangle_perimeter(3,4,5)

12

In [11]:
triangle_area(3,4,5)

6.0

In this notebook we have covered the basic syntax and elements of python that are needed for manipulating data. Over the next few notebooks we will build up the ability to perform common data tasks using these elements.

Although it important to know and understand these, we will see later  though, we don't explicitly use these elements for real data science work. We work with large collection and objects that represent the data in a more intuitive way and also often are more efficicient for processing large real-world datasets.


## Appendix - Running python

There are several ways to run python which I will mention in the context of how they relate to Data Science work in python.

### Command line - interactive
The most common way to run python is to start an interactive session from the command line.

`python`

alternatively, you can start an 9interactive session after first running some prepartory code:

`python -i my_setup_code.py`

### Command line - batch

Another common one is to run some code as a command. This can be done in several ways.

* A line of code `python -c "print('hello world')"`
* run the contents of a file `python my_code.py --arg1 value1 --arg2 value 2 --flag1`
* you can also add a shebang line to your file to run as an exacutable script without mentioning python
 * shebang `#!/user/bin/env python`
 * calling `./my_exec_script --arg1 value1 --arg2 value 2 --flag1`



### Jupyter notebooks - a useful data science tool

![Jupyter logo](https://upload.wikimedia.org/wikipedia/commons/thumb/3/38/Jupyter_logo.svg/518px-Jupyter_logo.svg.png)

Jupyter notebooks are a way of presenting a document that combines text, pictures etc. (in markdown) with executable code and the output of that code.

There is a execution kernel for each notebook which runs the code. Content is divided up into cells. Cells can be run in any order, changing the state of the kernel. Best practice is to keep execution order aligned with the order from top to bottom of the content in the notebook.

Documentation

https://jupyterlab.readthedocs.io/en/stable/

Below are some exampe cells that demonstrate concepts.

In [15]:
# this is a code cell
a = 3
b = 5


In [16]:
# if a variable is not assigned, the result will be stored as usual
c = a *b

In [17]:
# if a variable is not assigned, the result will be displayed. This demonstrates how code and output are stored in the notebook
a * b

15

In [18]:
c

15

Jupyter notebooks are a very useful tool for Data Science, as they facilitate development of relevant code as well as communication of and about the data, code and results all in the same place. We will see some of the other advantages of Jupyter Notebooks for Data Science development and communication through the rest of these tutorials.