# How to setup the computer for this tutorial?

This tutorial assumes Python is installed on the computer.

1) To verify, on Windows, press the "Windows" button and type `cmd` to open the command prompt.
On a MacOS or Linux, open a terminal.

2) Make sure you downloaded the `Untitled.ipynb` (this file) to a newly created directory, called `python-intro`.

3) Navigate to this directory in the command prompt/terminal using `cd python-intro` command.

4) Create a python virtual environment by executing in the command prompt/terminal `python -m venv venv`

5) Activate the virtual environment by executing on Windows `venv\Scripts\activate`, on MacOS/Linux `. venv/bin/activate`

6) Install the Python libaries used by the tutorial by executing `pip install jupyter pandas openpyxl`

7) Start the jupyter by executing `jupyter notebook`

<img src='https://github.com/marcindulak/python-intro/raw/main/python-intro.gif' width='960' align='center'>

# What is Python?

Python is an interpreted language. For our purpose this means that no manual compilation step is necessary when running Python programs.
This results in a fast-feedback loop, and makes Python a popular tool for exploratory data analysis.

Here is an example of [Compiling the Linux kernel](https://www.youtube.com/watch?v=_va03wswz1E) program code, on a fast machine.
It takes about a minute of waiting, during which one cannot interact with the program. On the other hand, in Python,
one can start the interpreter and provide it with instructions which are executed without additional steps or waiting time.

![](https://imgs.xkcd.com/comics/compiling.png)

Source: https://3d.xkcd.com/303/

# Why is Python popular?

Python was created in 1991, and in mid 1990s and early 2000s,
research institutions started replacing some parts of the compiled code with Python.
Initially Python was used as a glue language to start and coordinate other programs,
but apperance of fast numerical libraries in Python, allowed it to enter the domain of high performance computing.
For example, Python libraries like [numarray : A New Scientific Array Package for Python](https://www.semanticscholar.org/paper/numarray-%3A-A-New-Scientific-Array-Package-for-Greenfield-Miller/3727b1ace9097add6657b8d836b28e4dc4d8a1a9)
were used for operations on data from the Hubble telescope around year 2000 .

The term "data science" was almost inexistent at that time, even though it can be tracked to 1974 https://en.wikipedia.org/wiki/Data_science#Etymology. The focus of the 2000s was on numerical simulations,
and this resulted in creation of more numerical libraries for Python, like https://numpy.org/ or https://scipy.org/.

In early 2010s "data science" became an oportunity for natural science students to enter higher paid jobs,
instead of becoming postdocs or possibly assistant professors. The popularity of Python among non-computer science
students, and existence of early "data science" tools like https://en.wikipedia.org/wiki/Scikit-learn or https://en.wikipedia.org/wiki/Pandas_(software) enforced the position of Python as a programming language.

# How popular is Python?

Python according to it's online presence, as measures by various indexes like https://www.tiobe.com/tiobe-index/, https://pypl.github.io/PYPL.html or https://survey.stackoverflow.co/2022/, Python is in the top 3, if not the most popular language.
It competes with languages like Java, Javascript or C/C++.

On the other hand, the Python project is maintained by volunteers at https://github.com/python, with
minor grants from digital technology giants like meta https://pyfound.blogspot.com/2022/03/meta-deepens-its-investment-in-python.html
```
Meta has made a $300,000 Visionary level sponsorship of the Python Software Foundation
```
or google https://cloud.google.com/blog/products/open-source/supporting-the-python-ecosystem
```
First, we’re announcing a donation of more than $350,000 to support three specific PSF projects, with a focus on improving the supply-chain security of the Python ecosystem.
```

The grants are sufficient to maintain **one** Developer-in-Residence role https://pyfound.blogspot.com/2021/07/ukasz-langa-is-inaugural-cpython.html

![](https://imgs.xkcd.com/comics/dependency.png)

Source: https://xkcd.com/2347

The name Python is a tribute to the British comedy group *Monty Python*. **And Now for Something Completely Different.**

# What is Jupyter?

We are currently in Jupyter.

Jupyter is an interactive development environment. Its name stems from the main supported languages: Julia, Python and R.

Jupyter represents an implementation of so called [literate programming](https://en.wikipedia.org/wiki/Literate_programming) paradigm from 1984, where one intersperses code, explanations and other documentation means like images or videos, in order to make the program better understandable for humans.

Until now we've been in the documentation (so called `Markdown`) type cells. In order to create a new cell, click the plus **+** symbol located on top of the current tab.

In [None]:
# We are in a Code cell now. The switch between Code and Markdown in the top menu.
# Lines started with the pound sign # are comments, and are not executed by Python

In [None]:
# We are in another Code cell
# Let's execute the first Python statement to print a string of characters to the screen
print("Hello, World!")
# The output will appear below the cell

## Exercise 1

Python includes several built-in types: for example strings, integers, booleans and lists.

Examples of strings are `"Hello, World!"` and `"1"`, they are "quoted".

Examples of integers are `0` and `1`.

The only booleans are `True` and `False`. The lower/upper case matters in Python.

Examples of lists are `[1, 2]` and `[True, "a", 1]`.

In a Code cell below, use the `print` function to ouput the following results:
- the sum of `2` and `2`
- the sum of `"a"` and `"b"`
- the sum of and empty list `[]` and a list of `[True]`, the result of which multiplied by `3`.

Use the standard operators `+` for sum and `*` for multiplication.

What results do you expect?

# A surprise!

How is the output below possible?

In [1]:
print(2 + 2)

5


### How is this possible?

The answer is in the 2018 talk [I don't like notebooks](https://www.youtube.com/watch?v=7jiPeIFXb6U), and can be summarized that notebooks have an internal state that can be modified out of execution order. The interactive behavior and format of the notebooks makes applying some of the mainstream practices of software engineering, like testing, extra challenging.

There are however organizations that try to make notebooks more robust. See the 2020 talk [I like notebooks]( https://www.youtube.com/watch?v=9Q6sLbz37gk).

# Where to learn Python?

For a short, free interactive tutorial that includes the basic concepts and a glimpse
of more advanced ones, like data science tools and functional programming see https://www.learnpython.org/.
This tutorial is powered by https://datacamp.com, which offers many introductory courses.

For an extensive, free, non-interactive tutorial see https://swcarpentry.github.io/python-novice-inflammation/.

For a college level introduction to Python, which focuses on the basics of numerical methods
see https://www.edx.org/xseries/mitx-computational-thinking-using-python.

For an in-depth live demonstration of most concepts of the Python language,
like functional and object-oriented programming see https://www.udemy.com/user/fredbaptiste/.

For keeping up to date with the latest developments in Python, see the half-day workshops presented
during the Pycon conference https://www.youtube.com/c/pyconus/videos. They are availabe for free.

# Something insane, crazy - combine Excel files with Python!

As an introduction, here is a part of Pycon presentation by David Beazley, who is an engaging speaker.

[David Beazley: Generators: The Final Frontier - PyCon 2014](https://www.youtube.com/video/D1twn9kLmYg#t=2h16m39s)

In [None]:
import pandas as pd

# Create two excel files to be combined
with pd.ExcelWriter('excel1.xlsx', mode='w') as writer:
    pd.DataFrame.from_dict({'id': [1, 2, 3]}).to_excel(writer, sheet_name='Sheet1', header=True, index=False)
with pd.ExcelWriter('excel2.xlsx', mode='w') as writer:
    pd.DataFrame.from_dict({'id': [4, 5, 6]}).to_excel(writer, sheet_name='Sheet1', header=True, index=False)

In [None]:
import pandas as pd

# Combine the files
with pd.ExcelWriter('python.xlsx', mode='w') as writer:
    sheet1 = pd.read_excel('excel1.xlsx', sheet_name='Sheet1', header=0)
    sheet2 = pd.read_excel('excel2.xlsx', sheet_name='Sheet1', header=0)
    pd.concat([sheet1, sheet2]).to_excel(writer, sheet_name='Sheet1', header=True, index=False)

In [None]:
# Read the created excel file and print the contents
print(pd.read_excel('python.xlsx', sheet_name='Sheet1'))

In [None]:
# Test that combining produced the expected contents
assert pd.read_excel('python.xlsx', sheet_name='Sheet1').to_dict('list') == {'id': [1, 2, 3, 4, 5, 6]}