# Getting Started for our next Python-based Workshop

------------------------------------------------------------------------------------------

# Loops



Let's recall the idea of making __lists__ and __dictionaries__, or how we can store information. Lists can be made using either:
- ['a','b','c']
- list('a', 'b', 'c')


Here are some methods you can use with lists:

| Method |	Description |
|----|----|
|append() |	Adds an element at the end of the list |
|clear() |	Removes all the elements from the list |
|copy() |	Returns a copy of the list |
|count() |	Returns the number of elements with the specified value |
|extend() |	Add the elements of a list (or any iterable), to the end of the current list |
|index() |	Returns the index of the first element with the specified value |
|insert() |	Adds an element at the specified position |
|pop() |	Removes the element at the specified position |
|remove() |	Removes the item with the specified value |
|reverse() |	Reverses the order of the list |
|sort() |	Sorts the list |

In [None]:
list1 = ['a','b','c']
list2 = list( (1, 2, 3) )

print(list1)
print(list2)

In [None]:
# Make a list of local large universities

university = ['Rutgers', 'Princeton', 'TCNJ', 'UPenn', 'NYU', 'Columbia']



In [None]:
university.append("Penn State")

print(university)

But how can I _make_ a list: You can use a __LOOP__.

A __for__ loop is used for iterating over a sequence (that is either a list, a tuple, a dictionary, a set, or a string). This is less like the _for_ keyword in other programming languages, and works more like an iterator method as found in other object-orientated programming languages.

With the __for__ loop we can execute a set of statements, once for each item in a list, tuple, set etc.

The types of loops you can work with include:
- __for__ loops
- __if/else__ loops, where "if" a condition is met, do A; if another condition is met, do B
- __while__ loops, where you do A repeatedly until a condition is met

In [None]:
# Use a 'for' loop to add elements of list1 and list2 into a new list
list1 = ['a','b','c']
list2 = list( (1, 2, 3) )

list3 = []

for stuff in list1:
    list3.append(stuff)
    
for item in list2:
    list3.append(item)

print(list3)

In [None]:
# Add some items to a list using a for loop

need_food    = ['apple', 'banana', 'cherry', 'orange', 'kiwi', 'melon', 'mango']
grocery_list = []

print(need_food)

for item in need_food:
    print(item+item)
    grocery_list.append(item)
    
print()
print(grocery_list)

In [None]:
# You can run a loop within a loop

grocery_list = []

for item in need_food:
    
    if item == "orange":
    
        grocery_list.append(item)
    
print(need_food)
print()
print(grocery_list)

In [None]:
# If you want to stop the loop based on a specific condition, use "break"

for item in need_food:
    if item == 'cherry':
        break
    else:
        print(item)
    


In [None]:
# If you want to keep the loop going even if it runs into an issue, use "continue"

for item in need_food:
    if item == 'cherry':
        continue
    else:
        print(item)



In [None]:
# Unsure how many times you need to iterate over? Try "range()"

for i in range( len(need_food) ):
    print(str(i))
    print(need_food[i])




A "Pythonic" way to make lists less bulky is to use __list comprehension__, which offers a shorter syntax when you want to create a new list based on the values of an existing list.

With list comprehension, you are building a loop in a single line

In [None]:
list1 = ['a','b','c','d','e']

list2 = [ list1[i] for i in range(len(list1)) ]
print(list2)

list3 = [ list1[i] for i in range(len(list1)) if i!=2 ]
print(list3)

A _list_ stores items, but if you want to store items based on specific indicators, you can use a __dictionary__. A dictionary is accessed based on the "key" and once it finds that key, it will access and use the "value" associated with it. This of the key as the word you want to learn about in an actual dictionary and its definition is its value. 

Dictionary items are ordered, changeable, can be of any data type, and does not allow duplicate keys (one can, however, have duplicated values).

Dictionary items are presented in key:value pairs, and can be referred to by using the key name.

Dictionaries are changeable, meaning that we can change, add or remove items after the dictionary has been created.



In [None]:
food_dictionary = {'mango':4,
                   'banana':2,
                   'apple':19}

print(food_dictionary)


In [None]:
# Now that a dictionary has been made, how do you access it?

print(food_dictionary['banana'])


Some great dictionary methods to work with:

| Method |	Description |
|----|----|
|clear() |	Removes all the elements from the dictionary |
|copy()	| Returns a copy of the dictionary |
|fromkeys() |	Returns a dictionary with the specified keys and value |
|get() |	Returns the value of the specified key |
|items() |	Returns a list containing a tuple for each key value pair |
|keys() |	Returns a list containing the dictionary's keys |
|pop() |	Removes the element with the specified key |
|popitem() |	Removes the last inserted key-value pair |
|setdefault()	| Returns the value of the specified key. If the key does not exist: insert the key, with the specified value |
|update() |	Updates the dictionary with the specified key-value pairs |
|values() |	Returns a list of all the values in the dictionary |

# Reading files

Files in directories can be loaded into the computer and stored for future use. This is done using the open() function. This function requires two pieces of information: the path and file you are either making or editing and what you want to do with it.

You do NOT need a file in your directory to use this function- This function can be used to make files as well.

Make sure to save this with a variable. It should look something like this:

x = open(file, mode)

Modes include:
- "r" - Read - Default value. Opens a file for reading, error if the file does not exist

- "a" - Append - Opens a file for appending, creates the file if it does not exist

- "w" - Write - Opens a file for writing, creates the file if it does not exist

- "x" - Create - Creates the specified file, returns an error if the file exist

In addition to these modes, there are important white space characters that are paired with an "escape" character. You may need to use them to be very specific in how you read or write your files. Examples include:
- ' \\' ' or ' \\" ' maintains quotation marks within a defined string
- ' \t ' is a tab space, which is four spaces in code language
- ' \n ' is a new line, which moves the next string of text to the next line

In [None]:
# Write a file called boat.txt that contains the lyrics to Row, Row, Row Your Boat.

x = open("boat.txt", "w")

song = ["Row, row, row your boat", "Gently down the stream", "Merrily, merrily, merrily, merrily", "Life is but a dream."]

for item in song:
    x.write(item + "\n" )
    
x.close()

How file information is organized upon reading depends on the method used.
- read(): 
- readline():
- readlines():

In [None]:
y = open("boat.txt", "r")

data = y.readlines()

print(data[0])



In [None]:
print(data)

data = [i.rstrip('\n') for i in data]

print(data)

Extensions on file names are important. 
- A '.txt' file is a text file
- A '.csv' file is a tabular file where items in each line are separated by commas (hence, Comma Separated files)

In [None]:
# Write a file called freshman.csv that lists five courses, 
# some course number, the number of credits, the days, and the time in meets







In [None]:
# Open freshman.csv and add two additional courses






# Modules 

There are a lot of instances where you may want to run the same bit of code over multiple instances. Instand of rewriting that bit of code repeatedly, a "function" can be built. 

A basic function contains a few bits including:
- A way to define the function and its name: "def my_function():"
- Indent, then the bits of code.

In [None]:
def test_function():
    print("This is my test function")

However, there may be times to rebuild code that may already exist. Those files of code exist in a standard Python build and are referred to as "modules". These bits of files can be imported into your code and once that's done, you have the ability all of the functions in that code.

Standard module libraries include:
- _os_: a module that allows Python to interact with the operating system
- _sys_: a module that gives functions that can be used with the runtime environment
- _math_: a standard mathematics function library
- _html_ and _urllib_: modules that allows users to work with data on websites and how to parse information effectively


For data analysis and data sciences, two great modules are numpy and pandas. __NumPy__ (https://numpy.org/doc/stable/contents.html) is a package that lets you organize data into multidimensional arrays. 

In [None]:
import numpy as np


Generating basic arrays of specific sizes include:

| Method |	Description |
|----|----|
|zeros( (N, M, ...) ) | Array of all 0's of shape N entries x M entries|
|ones( (N, M, ...) )	| Array of all 1's of shape N entries x M entries |
|empty( (N, M, ...) ) | Empty array of shape N entries x M entries |
|eye( N ) | The identity matrix whose axes sizes are of length N |

Mathematical Functions are included as functions, including:
- add(), subtract(), multiply(), divide()
- sin(), cos(), tan(); arcsin(), arccos(), arctan(); sinh(), cosh(), tanh()
- Angle conversions: deg2rad() and rad2deg()
- exp() and log() 
- Linear Algebra multiplication operations: dot() and cross()

Finally, one can save and load text files into numpy arrays.
- loadtxt(filename)
- genfromtext(filename, delimiter='')
- savetxt(filename, array name, delimiter

In [None]:
np.multiply(25, 1025)


In [None]:
x = np.ones( (3,3) )
print(x)

In [None]:
x.dtype

In [None]:
np.exp( np.multiply(-25, 1025) )

Another module is __Pandas__, the "Python Data Analysis Library" (https://pandas.pydata.org/docs/). This package also organizes data into dataframes.

pandas has two main data structures: `Series` objects and `DataFrame` objects

In [None]:
import pandas as pd


# We can make a DataFrame using a dictionary
# the dictionary keys are the column labels
# the dictionary values are columns

df = pd.DataFrame({'one':[3,4,5,2,4,5], 
                       'two':['a','b','e','h','l','p']})

df

In [None]:
df['two']

This is a `DataFrame`, the unlabeled column is the index, the labeled columns are `Series` objects themselves.

Helpful functions include:
- read_csv() and to_csv()
- head() and tail()
- sort_values(_column_header_name_)
- drop( [] ) and dropna()

Getting descriptive statistics from `DataFrame` and `Series` objects are easy:
- describe(), which gives everything listed below
- min() and max()
- mean(), median(), and mode()
- value_counts()


In [None]:
df.sort_values("one")

In [None]:
df = pd.read_csv("catsvdogs.csv")

df.head(5)

In [None]:
df.dtypes

In [None]:
df["Percentage of households with pets"].describe()

In [None]:
df.describe()

In [None]:
df.sort_values(['Percentage of households with pets'], ascending=False)

In [None]:
X = df['Percentage of households with pets'].to_numpy()
X