# 1. Learning Python

Before learning about Machine Learning, you need to know the basics of Python. This tutorial will encompass types of variables, casting, objects, methods, parameters, and more. This will help you understand the content when you move into the next stage for learning about Machine Learning.

## Jupyter Notebook

Jupyter Notebook is a web application that is locally hosted to provide an environment for Python. Jupyter Notebooks use iPython for the coding language, and can be used for various subjects, like machine learning and others. The work is saved onto a notebook, or .ipynb file. Jupyter Notebook is great for organizing code and separating code to give a clean look.

## Variables

Different from other languages, Python does not require variables to have an indication to what type each variable is. To make a variable, it is as simple as giving a name and assigning a value.

In [1]:
variable1 = 1

The basic types of variables are integers, strings, booleans, floats, doubles, and chars.

In [2]:
integerVariable = 5
stringVariable = "Hello world"
booleanVariable = True
floatVariable = 12.34
charVariable = 'a'

However, variables can also store other values, such as lists. A list is a collection of values that is stored in one variable. Lists can store any kind of variable into a list, and is not limited to one type of variable per list.

In [3]:
listExample = ["Hello", 5, True, 12.34, 'a']

Lists can also store lists, making a two-dimensional list.

In [4]:
list1 = [1,11,111]
list2 = [2, 22, 222]
list3 = [3, 33, 333]
listTotal = [list1, list2, list3]
print(listTotal)

[[1, 11, 111], [2, 22, 222], [3, 33, 333]]


## Methods

Methods are segments of code that do not trigger when the cells run. It can only be triggered when called upon. Methods are useful when having a repeating task, as you can use the method to simplify the length of the program. An example for the use of a method is shown below.

In [14]:
#Without Methods
print(3*1)
print(3*2)
print(3*3)
print(3*4)

#With methods
def timesThree(num):#This is how you declare a method. "num" in this case is a parameter.
    return num*3
print(timesThree(1))#To call on a method, type its name, and input any parameters
print(timesThree(2))
print(timesThree(3))
print(timesThree(4))

3
6
9
12
3
6
9
12


Methods are useful in machine learning, for example, when getting a visualization of each feature using MatPlotLib(learn about it below). Without methods, the code would be hundreds of lines long due to the large number of features. However, due to methods, that is cut short.

## Loops

Loops are useful for cleaning up the code and making it much easier to do large quantities of similar commands. A loop iterates through a certain number of elements, whether it be the characters in a string, or counting from 1 to 10. Loops can be used in tandem with methods to further simplify code. The example below will continue from the example with methods to show the capability of loops.

In [17]:
#Without loops
def timesThree(num):
    return num*3
print(timesThree(1))
print(timesThree(2))
print(timesThree(3))
print(timesThree(4))

#With loops
def timesThree(num):
    return num*3
for i in range(1,5):#This is how you declare a for loop; this example iterates through a list of numbers, specifically 1-4 (5 is not included)
    print(timesThree(i))
    
#You can also iterate through a string with a loop
for t in "hello world":
    print(t)

3
6
9
12
3
6
9
12
h
e
l
l
o
 
w
o
r
l
d


## Packages

Packages are important to learn, as it is not possible to code every function yourself. To ease programming, there are packages that have premade classes and methods for you to use. To import packages, use the code below.

In [5]:
#To import a package, type in "import <package name>"
import h5py
#You can also import a package and give it a name to use as an object.
#By using the keyword "as", you designate the package to that denotation.
import numpy as np

### NumPy

NumPy is a package that is used specifically to create multidimensional array objects to store data. It encompasses several functions that can be used in the domain of algebra and other/mathematical/scientic purposes. In this case, numpy arrays are used to store information on the constituents of the jets in the dataset.

In [11]:
#To make a basic NumPy array, first import the package as an object
import numpy as np

#Next, make the array by using the function "np.array". 
#Within the parentheses, enter the values for which you want the array to store. 
numPyArray = np.array([[1,2],[3,4],[5,6]])

#The NumPy package also has attributes that help you learn more about what you are working with.
#Here are a couple examples below:
print(numPyArray.ndim)#Gives dimensions of array
print(numPyArray.shape)#Gives shape of array
print(numPyArray.size)#Gives size of array
print(numPyArray.dtype)#Gives type of array
print(numPyArray.itemsize)#Gives size in bytes of each element in array

#Here are some useful methods for NumPy arrays.
numPyArray = np.array([1,2,3,4,5,6])
numPyArray.reshape(3,2)#Reshapes the array to the specified dimensions
numPyArray2 = np.arange(7)#Makes an array with the values from 0 to n-1, where n is the value inputted.

2
(3, 2)
6
int64
8


### H5Py

H5Py is a package that is used specifically to store data in and open Hierarchical Data Format Version 5(HDF5) files. HDF5 files are perfect for large quantities of data, as the file compresses the data to a manageable size for the file to be transferred. There is versatility in what you want to do with the dataset using H5Py. For now, H5Py will be mainly used to open HDF5 files.
To have the code below to work, download the dataset from this website: https://cernbox.cern.ch/index.php/s/AgzB93y3ac0yuId?path=%2Ffixed.

In [None]:
#To use H5Py, first import the package.
import h5py

#To open an HDF5 file, use the function below:
f = h5py.File('processed-pythia82-lhc13-all-pt1-50k-r1_h022_e0175_t220_nonu_truth.z', 'r')#Put your own file path in here.
f.keys()#Shows all keys in the file
treeArray = f['t_allpar_new'][()]#Retrieves the dataset in the file, and sets it to the variable "treeArray". 
                                 #The empty tuple indexing retrieves all values in the set.
print(treeArray.dtype.names)#Prints out each column from the dataset.

### MatPlotLib

MatPlotLib is package used to create the visualization of data in the form of graphs. Some examples of MatPlotLib being used in this tutorial will be the visualization of the ROC curve and the AUC curve. The use of MatPlotlib will come later in the tutorial when it is on the topic of training and validation of the model.

### Pandas

Pandas is a great tool that can be used to manipulate data and reformat it easily. It works synonymously with NumPy, so it can cooperate with several of the functionalities of NumPy as well. Pandas can be used to open files as well as make datasets. An example of Pandas being used is shown below:

In [13]:
import pandas as pd #Import the package as an object "pd"

df = pd.DataFrame({#The function used to build the set, called a DataFrame for Pandas
    "Name": ["Braund, Mr. Owen Harris",#Column name is "Name"
    "Allen, Mr. William Henry",
    "Bonnell, Miss. Elizabeth"],
    "Age": [22, 35, 58],#Column name is "Age"
    "Gender": ["male", "male", "female"]}#Column name is "Gender"
)

print(df)

#To open an HDF5 file in Pandas, use the function below:
opened_set = pd.read_hdf('test.h5', key = 'table')#Opens the file in the format of a table. 
                                                  #Can open either a ".z" or ".h5" file.

                       Name  Age     Sex
0   Braund, Mr. Owen Harris   22    male
1  Allen, Mr. William Henry   35    male
2  Bonnell, Miss. Elizabeth   58  female


An example of Pandas being used for machine learning is when collating all the features and labels together for training the model.

In [None]:
import pandas as pd #Import the package as an object "pd"

features = ['j_zlogz', 'j_c1_b0_mmdt', 'j_c1_b1_mmdt', 'j_c2_b1_mmdt', 'j_d2_b1_mmdt', 'j_d2_a1_b1_mmdt',
            'j_m2_b1_mmdt', 'j_n2_b1_mmdt', 'j_mass_mmdt', 'j_multiplicity']
labels = ['j_t']

features_labels_df = pd.DataFrame(treeArray,columns=features+labels) #Combines the data from the columns of each
                                                                     #feature and label into one set
                                                                     #, isolating it from the other data.
features_labels_df = features_labels_df.drop_duplicates() #Drops all duplicate values remaining in the set

print(features_labels_df)

### Keras and Tensorflow

Keras and Tensorflow are packages used specifically for machine learning. The packages are mainly geared towards bulding models to house all the layers, and towards training the model. However, similar to MatPlotLib, the use of these packages will be covered in the next tutorial.