#  Overview

This course has been designed to introduce Python in a way that means you can quickly get started with data analysis, building on your R knowledge. This includes functions for importing, manipulating and visualising data as well as some basic statistical analysis. With the emphasis on using Python for these tasks there will be a number of features of Python that would be taught in a typical programming course that we will not cover, including some of Python's in-built objects and their methods, working with matrices and arrays in NumPy and conditional flow and function writing. Where appropriate the material includes notes for further reference.

# Format of Course Material

Items appearing in this material are sometimes given a special appearance to set them apart
from regular text. Here is how a code example will look:

In [2]:
myvar = 1 + 2 # This is a comment
myvar

3

The format of these code blocks emulates how the code would be run in an interactive Python
console, with lines starting with the prompt `>>>` showing input lines of code, and those without
the prompt displaying the outputs you would expect to be returned on screen.

In addition to the code environments there are three text boxes which highlight Exercises, Tips
& Tricks, as well as Warnings.

**Exercise: An Example Exercise Box** 

This box will detail exercises to be performed during (or after) the training course, e.g.,

1. Load mtcars dataset into a data frame.
2. Find the mean values of each column.

**Tip: An Example Tip Box**

These boxes will detail an additional feature of Python or a helpful shortcut based on user experience.

**Warning: An Example Warning Box** 

These boxes will detail a warning, typically describing non-intuitive aspects of the Python language or common pitfalls that are encountered.

## Course Script and Exercise Answers

A great deal of code will be executed within Python during the delivery of this training. This includes the answers to each exercise, as well as other code written to answer questions that arise. Following the course, you will be sent a notebook containing all the code that was executed.

# What is Python?

Python is a powerful general purpose programming language with widespread use in many
applications domains. Python is open source and free
to use (even for commercial products), and available for all major operating systems.

The principal author of Python Guido van Rossum, who began work on it in the 1980’s, played a central role in its development as president of the Python Software Foundation (PSF) until the summer of 2018. In his absence, the non-profit organization will continue to be devoted to the Python programming language and its administration. Python has many contributors from all over the world, and the PSF is supported by many international sponsors.

## Key Features

The main differentiating factors of the Python language can be described as follows:

* At its core Python was designed for readability and clarity, and therefore has minimised the necessary syntax, making it easier to pick up and understand code you did not write.
* Python comes with an extensive standard library, but also allows easy addition of custom libraries.

Many ways exist to interface Python with other languages meaning that Python works well as 'glue' in many development applications. A few other advantages with working in Python are:

* The multiple-purpose language allows for easier integration between your data science, data engineering and software development teams.
* Python is multi-threaded making parallelisation easier.


## The Python Web Site

There are many online Python resources, almost all of which can be reached via the main
Python site: [http://www.python.org/](http://www.python.org/). From this site you can do many things, including:

* Download the latest copy of the core Python language.
* Find links, source code and documentation to many additional Python libraries.
* Find help on the use of Python in the online wiki.
* Join the "Python-Help" mailing list.
* Look for Python books and events.

##  Python Versions

The most current version of Python is 3.6 and is the focus of this course. The rest of this section
elaborates more on the history of Python versions and why this distinction is important.

In 2010 Python 3.0 was released, and introduced a range of improvements to clean up the base
language and improve its efficiency. However, some of these changes required fundamental
changes under the hood, and the decision was made to focus on new features and future
enhancements to the Python language, rather than patching them onto legacy code. For this
reason, Python 3 is intentionally not backward-compatible with previous versions.

This obviously has some knock-on effects, and even though the changes to syntax and usability
of the language are minimal, it has resulted in a protracted effort to port over the vast amount
of 3rd party libraries that have previously been written.

Python 2.7.x is still the most widely used version mostly because of the concern over library
compatibility mentioned above. However Python 2 is in its end of life stage (sunset in 2020), meaning that there will never be a Python 2.8 release as all future development of the language is now focused on Python 3.

This course is taught using Python 3 as it has more consistent syntax for beginners to become
familiar with, and will future proof any code that trainees end up writing after the course.

**Tip: Python 3 Compatibility**

A list of the top 360 Python packages and their compatibility with Python 3 is available from [http://py3readiness.org/](http://py3readiness.org/). Also, a detailed discussion of the differences between Python 2 and Python 3 can be found at [https://wiki.python.org/moin/Python2orPython3](https://wiki.python.org/moin/Python2orPython3) 

## The Python Package Index (PyPI)

Similar to the R community, the Python community has actively produced many libraries of new classes and functions for Python (called packages), which extend the capabilities of Python in many directions. The Python Package Index (PyPI) is the main repository for these packages, and at the time of writing there are over 90,000 packages available to download.