# Python Basics

## Hello, World

In [71]:
print('Hello, World')

Hello, World


## Software Architecture - Data and Control

Computer software is all about processing data to obtain information. Input data is processed by control logic to give output data.

For example, let's have a look at the "print('Hello, World')" statement above. 
What we did there is we simply provided a text input - 'Hello World' and we got a text output printed/displayed on our screen.

"print" is just a bunch of code grouped together that has a name ("print"), receives an input, performs some operation and returns an output. A grouping of code like this what we also call a "function" and we will see more about how we can create our own groupings of code (or functions) further down in the session.


Today, data comes in several forms but can be broadly classified into three buckets, voice, image/video and numeric/text.

    - Alexa and Siri are examples of voice processing applications.

    - Youtube, Instagram, Snapchat are examples of image/video processing applications.

    - Microsoft Excel is an example of numeric processing application.

Control refers to the processing logic which takes data and produces end results for us.

As an example, Alexa takes voice commands (data) from users, interprets those commands and produce relevant results (a video, an image or a web page).



## Data

As discussed earlier, data can be broadly classfied as voice, image/video and numeric/text. 

Today we will focus primarily on numeric/text data, ways to store such data and finally analyze/manipulate such data as per our requirements.

### Constants and Variables

#### Constants

In summary, one can call a constant as something which does not change its value.

There are several kinds of constants in the real world like

* Numbers, like 1, 3, 5, 7, 11 (commonly) called integers 
* Numbers, like 3.14, 1.414, 1.732 commonly called as decimal numbers, which have a integral and a fractional part.
* Character sequences like ' Python', 'Java', 'Math', 'Physics', commonly called as strings in programming languages.
* Boolean constants, True and False e.g. The Sun rises from the East, is a True statement.

#### Variables

A variable can be thought of something which refers to a constant, points to a constant or is a handle to a constant.

We need variables because we sometimes want to keep data in our source code in a place where we can recall it later and use it again.

In [72]:
number = 1
print(number)

1


"number" above is an example of a variable. Because the type of data stored in "number" is numeric, it is known as an integer variable.
Variables are categorized by their data types, e.g. integer, string, boolean, etc.

In [73]:
pi = 3.14  #example of a float variable because it holds a decimal number
print(pi)

3.14


###### As you can see above, we used "#" to explain in plain words what the code means, this is called a comment. The # is used to introduce comments in Python and they are important because it's how we explain our code to other programmers. 



In [74]:
prog_lang = 'Python' #string variable
print(prog_lang)

Python


In [75]:
state = True
print(state)

True


### Collections

The data types explained above are known as the primitive or basic data types. From these primitive data types we can build more complex ones by combining them together.

#### Lists

A list is essentially a sequence of elements - variables or constants.

In [76]:
f_pop_artists = ['Ariana Grande', 'Taylor Swift', 'Beyonce Knowles', 'Selena Gomez', 'Lady Gaga']
m_pop_artists = ['Ed Sheeran', 'Zayn Malik', 'Bruno Mars', 'Justin Bieber', 'Justin Timberlake']

In [77]:
print(f_pop_artists)

['Ariana Grande', 'Taylor Swift', 'Beyonce Knowles', 'Selena Gomez', 'Lady Gaga']


In the above, f_pop_artist is a variable of type list which contains several string constants.

##### List Indexing

In order to access certain elements of the list we use [] operator; this is called list indexing.

In [78]:
print(f_pop_artists[0], '|', f_pop_artists[2])

Ariana Grande | Beyonce Knowles


One can have lists of lists.

In [79]:
pop_artists = [ f_pop_artists, m_pop_artists ]

In [80]:
print(pop_artists[0])
print(pop_artists[1])

['Ariana Grande', 'Taylor Swift', 'Beyonce Knowles', 'Selena Gomez', 'Lady Gaga']
['Ed Sheeran', 'Zayn Malik', 'Bruno Mars', 'Justin Bieber', 'Justin Timberlake']


In [81]:
print(pop_artists[0][0])

Ariana Grande


In [82]:
print(pop_artists[1][0])

Ed Sheeran


## Control

As discussed earlier, control refers to the processing logic which consumes data and produces results. 

There are different ways in which the processing logic can flow in a program; here are the basic ones which you will find in any programming language:

        Sequence (or sequential)
        Conditional (or branching)
        Iteration (or looping)


### Sequence

A sequence is nothing but a set of instructions to the computer to be executed one after another. You can think of them as individual 'commands' to the computer.

Programming is nothing but finding the right sequence of instructions to be given to the computer to solve a particular problem.

In [83]:
x = 1
city = 'London'
country = 'United Kingdom'
planet = 'Earth'
age = 4.2e9
print(age)
print((x + 24) ** 2)

4200000000.0
625


### Conditional/Branching

Sometimes the logic which we want to implement is based on a set of choices / decisions to be made when the program runs.

In [84]:
age = 3

In [85]:
if age < 5:
    print('Nursery')
elif age > 5 and age <= 7:  #elif is a short form for 'else if'
    print('Reception')
elif age > 7 and age <= 13:
    print('Junior School')
elif age > 13 and age <= 18:
    print('Senior School')
elif age > 18 and age <= 24:
    print('University')
else:
    print('Working')

Nursery


### Iteration

Sometimes the logic we wish to implement requires us to do a certain thing, multiple number of times. This is known as iteration; we iterate over a number of steps.
Iteration is commonly called as 'looping'.

In [86]:
for count in range(5):
    print('{} times {} is {}'.format(count, count, count * count))

0 times 0 is 0
1 times 1 is 1
2 times 2 is 4
3 times 3 is 9
4 times 4 is 16


### Operations

Python supports several mathematical and logical operations.

In [87]:
x = 10
y = 20

In [88]:
x + y, x - y, x * y, x / y

(30, -10, 200, 0.5)

In [89]:
x == y, x != y

(False, True)

In [90]:
x > y, x < y, x >= y, x <= y

(False, True, False, True)

### Functions

A function is a reusable piece of code. It allows us to write a piece of logic which can be called multiple times within our software.

A function usually takes some variables as input and then uses these inputs to generate an output.

In [91]:
def greeter(name, greeting):
    print(greeting + ' ' + name)

The 'def' keyword tells us that this is a function; the variables between ( ... ) are the inputs or arguments to the function.

In [92]:
greeter(f_pop_artists[0], 'Hello')

Hello Ariana Grande


In [93]:
greeter(m_pop_artists[0], 'Hi!')

Hi! Ed Sheeran


### Modules

Python modules are a collection of useful data types and functions which have been developed by software engineers and made available for general use.

Python is a widely supported language and several developers have made their code available for others to use.

pandas (Wes Mckinney, AQR Capital) is a handy module for storage and manipulation of tabular and time-series datasets. [https://pandas.pydata.org/]

seaborn is a popular module for data visualization (we shall explore seaborn in session 2) [https://seaborn.pydata.org/]

There are several other popular modules like scikit-learn (machine learning), numpy (fast data processing), scipy (scientific Python), matplotlib (visualization) and so on.

##### Use the 'import' statement to utilise the services of a Python module

In [94]:
import numpy as np

In [95]:
dice_throws = []
for count in range(25):
    dice_throws.append(np.random.choice(6))

In [96]:
dice_throws

[2, 1, 3, 4, 0, 0, 2, 3, 5, 0, 5, 5, 0, 4, 5, 4, 0, 3, 0, 1, 2, 0, 5, 0, 4]

Python modules make it easy to develop new software using modules as a building block.

### DataFrames

Sometimes we have data available in a tabular or column-oriented structure, where each row in the table contains information on one topic/concept. 

DataFrames are a good way to analyze and process such tabular information. We will be using DataFrames in Session 2, so here is a short introduction.

Ref : https://www.kaggle.com/nadintamer/top-spotify-tracks-of-2018

In [97]:
import pandas as pd #Check section 'Modules' above

In [98]:
top_tracks = pd.read_csv('top_tracks_2018.csv')

In [99]:
top_tracks.head()

Unnamed: 0,name,artists,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
0,God's Plan,Drake,0.754,0.449,7,-9.211,1,0.109,0.0332,8.3e-05,0.552,0.357,77.169,198973,4
1,SAD!,XXXTENTACION,0.74,0.613,8,-4.88,1,0.145,0.258,0.00372,0.123,0.473,75.023,166606,4
2,rockstar (feat. 21 Savage),Post Malone,0.587,0.535,5,-6.09,0,0.0898,0.117,6.6e-05,0.131,0.14,159.847,218147,4
3,Psycho (feat. Ty Dolla $ign),Post Malone,0.739,0.559,8,-8.011,1,0.117,0.58,0.0,0.112,0.439,140.124,221440,4
4,In My Feelings,Drake,0.835,0.626,1,-5.833,1,0.125,0.0589,6e-05,0.396,0.35,91.03,217925,4


In [100]:
top_tracks.columns #You can read about the meaning of each column @ https://www.kaggle.com/nadintamer/top-spotify-tracks-of-2018

Index(['name', 'artists', 'danceability', 'energy', 'key', 'loudness', 'mode',
       'speechiness', 'acousticness', 'instrumentalness', 'liveness',
       'valence', 'tempo', 'duration_ms', 'time_signature'],
      dtype='object')

In [101]:
top_tracks.artists.value_counts().head() #Counts the number of times each artist occurs in the dataset; list top 5 occurences

XXXTENTACION    6
Post Malone     6
Drake           4
Marshmello      3
Ed Sheeran      3
Name: artists, dtype: int64

In [102]:
top_tracks[top_tracks.artists == 'Ed Sheeran'] #Finds the tracks by Ed Sheeran

Unnamed: 0,name,artists,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
25,Shape of You,Ed Sheeran,0.825,0.652,1,-3.183,0,0.0802,0.581,0.0,0.0931,0.931,95.977,233713,4
29,Perfect,Ed Sheeran,0.599,0.448,8,-6.312,1,0.0232,0.163,0.0,0.106,0.168,95.05,263400,3
84,Perfect Duet (Ed Sheeran & Beyonc?),Ed Sheeran,0.587,0.299,8,-7.365,1,0.0263,0.779,0.0,0.123,0.356,94.992,259550,3


In [103]:
top_tracks[top_tracks.time_signature != 4] #Finds the tracks with a time signature different than 4

Unnamed: 0,name,artists,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
29,Perfect,Ed Sheeran,0.599,0.448,8,-6.312,1,0.0232,0.163,0.0,0.106,0.168,95.05,263400,3
49,Call Out My Name,The Weeknd,0.489,0.598,1,-4.929,1,0.036,0.218,0.0,0.35,0.172,134.045,228373,3
84,Perfect Duet (Ed Sheeran & Beyonc?),Ed Sheeran,0.587,0.299,8,-7.365,1,0.0263,0.779,0.0,0.123,0.356,94.992,259550,3
97,No Brainer,DJ Khaled,0.552,0.76,0,-4.706,1,0.342,0.0733,0.0,0.0865,0.639,135.702,260000,5


In [104]:
top_tracks[top_tracks.duration_ms > 300000]

Unnamed: 0,name,artists,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
21,Te Bot? - Remix,Nio Garcia,0.903,0.675,11,-3.445,0,0.214,0.542,1.3e-05,0.0595,0.442,96.507,417920,4
42,SICKO MODE,Travis Scott,0.834,0.73,8,-3.714,1,0.222,0.00513,0.0,0.124,0.446,155.008,312820,4
