Learning Objectives
===================

## Acknowledgements
Some of the examples in this notebook are taken from:
[Python Crash Course - A Hands-on, Project-based, introduction to programming](https://www.amazon.co.uk/Python-Crash-Course-Hands-Project-Based/dp/1593276036)

## Why Python?


According to a 2013 survey by industry analyst O’Reilly, 40 percent of data scientists responding use Python in their day-to-day work. They join the many other programmers in all fields who have made Python one of the top ten most popular programming languages in the world every year since 2003.

Organizations such as Google, NASA, and CERN use Python for almost every programming purpose under the sun… including, in increasing measures, data science.

According to StackOverflow the top 5 most wanted languages in 2019 are:
* Python: 25.7%
* JavaScript: 17.8%
* Go: 15.0%
* TypeScript: 14.6%
* Kotlin: 11.1%

The top most wanted languages are nearly identical to last year’s report.

## Python vs R

### R
Academics and statisticians have developed R over two decades. R has now one of the richest ecosystems to perform data analysis. There are around 12000 packages available in CRAN (open-source repository). It is possible to find a library for whatever the analysis you want to perform. The rich variety of library makes R the first choice for statistical analysis, especially for specialized analytical work.

The cutting-edge difference between R and the other statistical products is the output. R has fantastic tools to communicate the results. Rstudio comes with the library knitr. Xie Yihui wrote this package. He made reporting trivial and elegant. Communicating the findings with a presentation or a document is easy.

#### Advantages
* Graphs are made to talk. R makes it beautiful
* Large catalog for data analysis
* GitHub interface
* RMarkdown
* Shiny
* A language designed for statistical analysis and data science

#### Disadvantages

* Slow 
* High Learning curve 
* Dependencies between libraries

### Python
Python can pretty much do the same tasks as R: data wrangling, engineering, feature selection web scrapping, app and so on. Python is a tool to deploy and implement machine learning at a large-scale. Python codes are easier to maintain and more robust than R. Years ago; Python didn't have many data analysis and machine learning libraries. Recently, Python is catching up and provides cutting-edge API for machine learning or Artificial Intelligence. Most of the data science job can be done with five Python libraries: Numpy, Pandas, Scipy, Scikit-learn and Seaborn.

Python, on the other hand, makes replicability and accessibility easier than R. In fact, if you need to use the results of your analysis in an application or website, Python is the best choice.

#### Advantages
* Jupyter notebook: Notebooks help to share data with colleagues
* Mathematical computation
* Deployment
* Code Readability
* Speed
* Function in Python
* A general purpose programming language

#### Disadvantages
* Not as many libraries as R

This blog post highlights the differences and similarities between the syntax and outputs of the two languages.

https://www.dataquest.io/blog/python-vs-r/

## Python versions

There are currently two different supported versions of Python, 2.7 and 3.7. Historically, 2.7 has been the version of choice for Data Scientists. This was due to more stable versions of packages such as Numpy, Pandas and Sci-kit-learn. However, as time has gone on Python 3 has been adopted as the version to use for all things data science. Python 2 is now considered a legacy version of the programming language and is due to be retired (read: not actively maintained) on January 1, 2020.


Somewhat atypical of updates to programming languages, Python 3 introduced many backwards-incompatible changes to the language, so code written for 2.7 may not work under the latest versions of Python and vice versa. For this class all code will use Python 3s.7.

You can check your Python version at the command line by running python --version.

Windows: https://docs.anaconda.com/anaconda/install/windows/

Mac: https://docs.anaconda.com/anaconda/install/mac-os/

Linux: https://docs.anaconda.com/anaconda/install/linux/

Verifying your installation: https://docs.anaconda.com/anaconda/install/verify-install/


## Modules

Modules are pre-defined libraries which add additional functionality to base Python installation. In order to have access to these libraries they need to be installed.

#### To install a module using Anaconda
`conda install package-name`

#### To install a specific version of a module using Anaconda
`conda install package-name=2.3.4`

#### To list all packages installed using conda
`conda list`

## Using Pip instead of Conda
You may want to not use conda and instead use a basic installation of python. If this is the case, your way for installing packages is using Pip. 

#### To install a module using Pip
`pip install package-name`

#### To install a specific version of a module using Anaconda
`pip install package-name=2.3.4`

#### To list all packages installed using conda
`pip list`

#### The modules you will require for this module
pandas
numpy
matplotlib
sklearn

#### To access the modules in python
`import pandas`

#### Using aliases
`import pandas as pd`

## Lets check to see if python installed correctly
Go to your command line or terminal and type python --version

If this works you should see something like
`Python 3.6.6 :: Anaconda custom (64-bit)`

If this doesn't work - take note of the error message.

### Using IDLE
IDLE is the standard Python development environment. Its name is an acronym of "Integrated DeveLopment Environment". It works well on both Unix and Windows platforms.

It has a Python shell window, which gives you access to the Python interactive mode. It also has a file editor that lets you create and edit existing Python source files.

You can type Python code directly into this shell, at the '>>>' prompt. Whenever you enter a complete code fragment, it will be executed. For instance, typing:

\>\>\> print "hello world"
 
and pressing ENTER, will cause the following to be displayed:

`hello world`

IDLE can also be used as a calculator:

\>\>\> 4+4

`8`

\>\>\> 8**3

`512`
 
Addition (+), subtraction (-), multiplication (*), division (/), modulo (%) and power (\*\*) operators are built into the Python language. This means you can use them right away. If you want to use a square root in your calculation, you can either raise something to the power of 0.5 or you can import the math module. Do not worry about what it means right now, we will cover this later during the course. 

Below are two examples of square root calculation:

\>\>\> 16**0.5

`4.0`

\>\>\> import math

\>\>\> math.sqrt(16)

`4.0`

### Using Spyder

Spyder is s a powerful interactive development environment for the Python language with advanced editing, interactive testing, debugging and introspection features. There is a separate blog entry providing a summary of key features of Spyder, which is also available as Spyder's tutorial from inside Spyder. The homepage for Spyder is https://www.spyder-ide.org/ but if you installed Python using Anaconda, then it should be part of the installation process.

The name SPYDER derives from "Scientific PYthon Development EnviRonment" (SPYDER).

It can be used as the main environment to learn about Python, programming and computational science and engineering and you will see that the layout and features are similar to RStudio.

Useful features include

* provision of the IPython (Qt) console as an interactive prompt, which can display plots inline
* ability to execute snippets of code from the editor in the console
* continuous parsing of files in editor, and provision of visual warnings about potential errors
* step-by-step execution
* variable explorer similar to RStudio


### Using Jupyter Notebooks

The Jupyter Notebook is an open source web application that you can use to create and share documents that contain live code, equations, visualizations, and text. Jupyter Notebook is maintained by the people at Project Jupyter.

Jupyter Notebooks are a spin-off project from the IPython project, which used to have an IPython Notebook project itself. The name, Jupyter, comes from the core supported programming languages that it supports: Julia, Python, and R. Jupyter ships with the IPython kernel, which allows you to write your programs in Python, but there are currently over 100 other kernels that you can also use.

You can use pip or conda to install the Jupyter Notebook package. If you installed Python using Anaconda, then it's likely that the Jupyter Notebook package was installed as part of this but if it wasn't, then you can use the following commands to install.

`pip install jupyter`  

`conda install jupyter`


### Some programming basics

## Working with Strings

In [1]:
## The most straightforward output command is the print statement, for example
print("Hello World")

Hello World


In [2]:
## We can take this further and assign strings to variables and then use the print statement to print the variable
message = "Hello World"
print(message)

Hello World


In [3]:
## Lets look at what happens if you try to print a message but get the variable name wrong.
message = "Hello World"
print(mesage)

NameError: name 'mesage' is not defined

In [4]:
## Here the error is clearly stated, and it also points to the line in which the area happened.

In [7]:
## We can do several useful things with strings by calling functions which manipulate strings.
name = "Alan Turing"
print(name)
## By calling the .lower function, we can set the whole string to lower case
print(name.lower())
## By calling the .upper function, we can set the whole string to upper case
print(name.upper())
## By calling the .title function, we can capitalise the whole string
print(name.title())
## We can also save the output of the function to a different variable
capName = name.title()
print(capName)

Alan Turing
alan turing
ALAN TURING
Alan Turing
Alan Turing


In [9]:
## It can be useful to concatenate (this means add together, two strings). For example.
firstName = "Alan"
lastName = "Turing"
print(firstName + ' ' + lastName)
## You can also use commas to concatenate
print(firstName, lastName) # Note, this puts a space in between the two variables automatically whereas 
# With the first statement, you have to insert a space.

Alan Turing
Alan Turing


In [12]:
## Now we can put a lot of what we've used together.
firstName = "alan"
lastName = "turing"
fullName = firstName + ' ' + lastName
print("Hello,", fullName.title() + "!")

Hello, Alan Auring!


## Working with Numbers

In [15]:
# Here are the basic arithmetic operations you should be familiar with
6 + 3 # Addition
6 - 3 # Subtraction
6 * 3 # Multiplication
6 / 3 # Division
5 % 3 # Modulo - Also known as the remainder
6 ** 2  # Raise to the power of 2 

36

In [16]:
# We can also combine string operations and arithmetic operations
age = 23
message = "Happy " + age + "rd Birthday!"
print(message)

TypeError: must be str, not int

In [None]:
# This is a problem because you cannot add a string to a number. In order for this to work you need to 
# convert the number to a string. You do this with a technique called Casting.
# If you wrap your variable around a function called str(), this converts whatever is inside the function
# into a string.
age = 23
message = "Happy " + str(age) + "rd Birthday!"
print(message)

## Comments

Reason #1

In an organization, we work in a team; there are many programmers who work on the same project. So, the well commented functions/logics are helpful to other programmers to understand the code better. They can easily understand the logic behind solving any problem.

Reason #2

If you see/edit code later, comments may help you to memorize your logic that you have written while writing that code.

Sometimes, it happens with lazy programmers (who do not comment the code properly) that they forget their implemented logics and waste much more time solving the issue.

So, I would recommend you please comment the code properly so that you or your colleagues can understand the logic better. Writing comments may take time, but it maintains the international coding standards.

To comment your code, you need to start the line or the place where you want the comment to start with a #

`# This is a comment`

`8 * 2 # This resolves to 16`

### Some Extra Resources

- [ ] [Book: Fluent Python](https://www.amazon.co.uk/Fluent-Python-Luciano-Ramalho/dp/1491946008)
- [ ] [Book: Python Crash Course - A Hands-on, Project-based, introduction to programming](https://www.amazon.co.uk/Python-Crash-Course-Hands-Project-Based/dp/1593276036)