# Introduction to Python

## Why Python?

### Python is popular!

Matlab used to be one of the mainstream languages, but now Python has taken over that role for data science and computer science

Measuring its popularity is in itself a data science problem!

Some sources for data that helps us measure Python's popularity
- [TIOBE](https://tiobe.com/tiobe-index/): Based on google search results
    - depends on how many people are searching for this platform; how many people are looking up Python on google search engine
- [PYPL PopularitY](https://pypl.github.io/PYPL.html): Based on google trends
    - based on ranking; how much online courses and youtube tutorials that are being looked up
    - focuses on beginners and what programs ppl are more likely to get started with (mostly Python)
- [GitHut 2.0](https://madnight.github.io/githut/#/pull_requests/2020/2): Based on Github
    - Based on what kind of programs people are using and uploading on Github, which is a platform that allows people to share their codes and collaborate and work on them
    - towards more experienced users (JavaScript seems to be the most popular, Python comes at a 2nd place)
    - MatLab (almost 0%), used by community of mathematicians and the platform is often expensive
- [Redmonk](https://redmonk.com/sogrady/2020/07/27/language-rankings-6-20/): Based on Github+Stack Overflow
    - compares popularity rank on Github with Stack Overflow
    - Stack Overflow is a website to ask about programming languages
    - scatterplot; shows a visual picture of the most popular languages in both platforms
- [Kaggle](https://www.kaggle.com/kaggle-survey-2020): conducted a survey asking data scientists
    - data includes the data scientists' gender, age, ethnicity, education, etc.
    - mostly male, 23-29 age gap is most popular, master's degree, taking online courses, experience, pay rate, etc.

### Python is good!

Stable learning curve
- [Entertaining cartoon to look at](https://github.com/Dobiasd/articles/blob/master/programming_language_learning_curves.md)

Scalability of Computation (with help from other packages)
- recommended: study data science with some other form of mathematics

Useful packages:
- Numpy: Scientific Computing
- Pandas: Data Analysis and Manipulation
- Scikit-Learn: Machine Learning
- Matplotlib: Visualizing Functions/Datasets
- Seaborn: Visualizing Statistical Data


## What is Python?

[Official Definition](https://www.python.org/doc/essays/blurb/): Python is an interpreted, object-oriented, high-level programming language with dynamic semantics.

- 2 Types: interpreted and compiled
    - Interpreted: we write something and we get an outcome (similar to Matlab where you'd write an equation and get an outcome)
        - easier to do interactive
    - Compiled: you finish other programming and compare it, then run it as a whole (C, C++, etc.)

- Object-oriented: focus on objects in the beginning of the course

Father of Python: [Guido van Rossum](https://gvanrossum.github.io/), see also the [history of python](https://en.wikipedia.org/wiki/History_of_Python).

- Fond of animal names: Python, Anaconda, Pandas etc.

The ZEN of python and to be pythonic.

- pythonic: Python "accent" (sometimes, people would write a code sort of MatLab style, and they would say it has a Matlab accent; pythonic is just a way to say Python accent).

- "this" is a package; the function is to show the Zen of Python (shown below)

In [1]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


## Python vs Matlab

view [this helpful video](https://realpython.com/matlab-vs-python/#syntax-differences-between-matlab-and-python)

- Python index starts from 0. [Explanation here](https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html)

In [2]:
array=[1,2,3] # create a list
print(array[0]) # first element is 0, in matlab, we use array(1)
print(array[-1]) # last element, in matlab we use array(end)
print(list(range(3,9))) # notice that it gives vector from 3 to 8
# it's easier to calculate number of elements.in this case, number of elements is 9-3=6 elements in the generated vector

1
3
[3, 4, 5, 6, 7, 8]


- Indentation. [Explanation here](https://web.archive.org/web/20070922223915/http://www.secnetix.de/~olli/Python/block_indentation.hawk)

In [3]:
if 5>2:
    print ("Five is greater than two!") # in Matlab we use if-end,in Python, make sure it's indented and we use {}

Five is greater than two!


## How to learn and use Python well
### Resources
- [Anaconda](https://www.anaconda.com/products/individual)
- [Jupyter Notebook](https://jupyter.org/install)
- Brief [guide](https://python.swaroopch.com/) to Python
- [Tutorials](https://jakevdp.github.io/PythonDataScienceHandbook/) on Python Machine Learning
- [UCI datasets](https://archive.ics.uci.edu/ml/index.php) and Kaggle
- different resources available if you search Python on Github and clone different notebooks as needed

In [4]:
a=1000
b=a
b=1
print(a)

1000


In [5]:
a=[1000,1]
b=a
b=[1,1]
print(a)

[1000, 1]


In [7]:
a=[1000,1]
b=a
b[0]=1
print(a)

[1, 1]
