
# Demonstration Workbook 1

#### Outline of topics discussed in this module:

1. Brief history of python
2. Python's place in the family of programming languages
3. Basic variables (integer, float, string)
4. Conversions between the basic variable types
5. Boolean variables
6. "Collection" data types
    a. lists
    b. tuples
    c. sets
    d. dictionaries
7. Operators
8. Control statements
9. Functions
10. Major Python Packages (NumPy, Pandas, matplotlib)
11. Agricultural data analytics examples


#### Some useful background material that will be referenced through out the module:

The <a href="https://the-examples-book.com/book/introduction" target="_blank">Purdue Data Mine Examples Book</a> contains many useful chapters on data science. While they have not been directly designed for this class, they may be useful. You will not need to use scholar to perform the exercises of this class so don't worry about that part. Here is a direct link to the <a href="https://the-examples-book.com/book/python/introduction" target="_blank">Python chapter.</a>


Additional useful links for Python include:

<a href="https://docs.python.org/3/" target="_blank">Python 3.9.4 documentation</a>

The <a href="https://pypi.org/" target="_blank">Python Package Index</a> (This contains many of the useful Python "add-on" packages such as the math package)

The <a href="https://numpy.org/" target="_blank">Numpy Package</a> (This contains specialized array (vector and matrix) routines. Numpy stands for "Numerical Python")

United States Department of Agriculture: <a href="https://quickstats.nass.usda.gov/" target="_blank">Quick Stats</a> (The USDA's National Ag Statistics Service -- go here and familiarize yourself with the available data)

You will need to use Git and Github to get the example code for the class. Some useful tutorial links are:

<a href="https://www.youtube.com/watch?v=USjZcfj8yxE" target="_blank">Learn Git in 15 Minutes</a> (Colt Steele)

<a href="https://www.youtube.com/watch?v=USjZcfj8yxE" target="_blank">Learn Github in 20 Minutes</a> (Colt Steele)

It makes sense to create a free github account for yourself: <a href="https://github.com/" target="_blank">Github</a>

#### **History of Python, Etc.**

- Conceived by Guido van Rossum in December 1989 at the Center Wiskunde and Informatica (Dutch national research institute for mathematics and computer science).
- Python version 1.0 in January 1994.
- GNU General Public License (open source) since version 1.6.1.
- Python version 2.0 in October 2000.
- Python Software Foundation formed in 2001 and a new open source license.
- Python version 2.7 was the last release in the version 2 series. Support ended January 2020.
-Python version 3.0 released December 2008. It broke backward compatibility with much of the verson 2 code.
- Latest version is 3.10 (October 2021).

#### According to stackoverflow survey of profession software developers in 2021 and 2023 ...

<img align="left" src='Figs/DeveloperSurvey2021.png' width="450"/>
<img align="right" src='Figs/DeveloperSurvey2023.png' width="450"/>

`checkout the Stack overflow developers survey results here` [2021](https://insights.stackoverflow.com/survey/2021), [2022](https://survey.stackoverflow.co/2022#overview), [2023](https://survey.stackoverflow.co/2023/#education-ed-level-learn)

#### As concerns languages for data science ...
The contenders are Python and R.   
For **Python**

* Most popular among data scientists.
* Very useful in machine learning and artificial intelligence because of the availabilty of popular libraries such as scikit-learn, matplotlib, and tensorflow, etc.

For **R**
* A scripting language.
* Very good support for statistical computation and visulalization.

This course is focussed on Python instead of R because that is a better for the work of my research group. For an independent comparison of the two: <a href="https://www.ibm.com/cloud/blog/python-vs-r" target="_blank">Python vs. R: What's the Difference?</a>

## Python Programming Language: Basics of Programming

#### *Dicussion 1*: Introduction to Python basic variables

Python uses something called **dynamic typing**, which means that a variable is created when a value is assigned to it. The type can be changed after originally set. There are a few rules on variable names:

* Must start with a letter or underscore
* Names are case-sensitive

A python variable is more than just its value. It must also contain information about the type of the value. There is overhead associated with such flexibility. The code below illustrates three of the variable types: **integer, float, and string**.

In [4]:
"""Integer, i.e., whole numbers both positive and negative. Later on 
we will illustrate formatting the print command."""
x = 4 # declare x to be an integer
print('The type of x is:', type(x)) # print the type of x
print() # Just to give a space.
print('The value of x is:', x) # print the value of x

The type of x is: <class 'int'>

The value of x is: 4


In [3]:
# Floating point, i.e., computer representation of real numbers.
x = 4.0
print('The type of x is:')
print(type(x))
print() # Just to give a space.
print('The value of x is:')
print(x)

The type of x is:
<class 'float'>

The value of x is:
4.0


In [3]:
# Strings. A string is a sequence of characters. They can be delimited
# by single quotes ('blah') or double quotes ("blah blah")
x = "four"
print('The type of x is:')
print(type(x))
print() # Just to give a space.
print('The value of x is:')
print(x)

The type of x is:
<class 'str'>

The value of x is:
four


 **datatype conversion**  
- Python has a built-in command `float()` that can convert integers and certain strings to floating point numbers.  


In [2]:
# Start with an int.
x = 4 # declare x to be an integer
y = "4" # declare y to be a string        
print('The type of x is: ',type(x))
print('The type of y is: ',type(y))
print('The value of x is: ',x)
print('The value of y is: ',y)
print('The sum of x and y is: ',x+y) # the error from this statement is intentional and self-explanatory.

The type of x is:  <class 'int'>
The type of y is:  <class 'str'>
The value of x is:  4
The value of y is:  4


TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [6]:
# Convert to float
x = float(x)
y = float(y)
print('The type of x is:',type(x))
print('The value of x is: ',x)
print('The type of y is:',type(y))
print('The value of y is: ',y)
print('The sum of x and y is: ',x+y)# did we correct the error?


The type of x is: <class 'float'>
The value of x is:  4.0
The type of y is: <class 'float'>
The value of y is:  4.0
The sum of x and y is:  8.0


There is also a python command `int()`, which can covert floats to integer and certain strings to integer, and a command `str()`, which converts numbers to strings.

#### *Discussion 2* :More datatypes  
**Boolean datatype**
A Boolean value has a python type **bool**. The possible values a Boolean variable can take are: **True** and **False**. These are typically used to hold the results of logical tests, which in turn can be used to control the flow of a python program.

In [8]:
x = True # declare x to be a boolean
print('The type of x is:')
print(type(x))
print()
print('The value of x is:')
print(x)

The type of x is:
<class 'bool'>

The value of x is:
True


**Collection data types**  
There are four **collection** data types: **lists**, **tuples**, **sets**, and **dictionaries**. (Some say that a **string** is a collection data type since it is a ordered set of characters). For now we will only consider lists and sets.

 <u>Lists</u> are ordered, changeable, and allow duplicate members:

In [3]:
# Create a list with 5 elements.
Coloradothings = ["wheat", "corn", "sugar beets", "pinto beans", 1959] # declare Coloradothings to be a list
print('The type of Coloradothings is:')
print(type(Coloradothings))
print()# Just to give a space. You will not need this in your code.
print('The length of Coloradothings is:')#What do you think this will print? 
print(len(Coloradothings))
#Can you print the index of the first element?
print("The first element of Coloradothings is:",Coloradothings[0]) #note the index starts at 0. So the last element is at index 4.
print() # Just to give a space. You will not need this in your code.
print('The value of Coloradothings is:') # print the value of Coloradothings. Would you have guessed this output?
print(Coloradothings)

The type of Coloradothings is:
<class 'list'>

The length of Coloradothings is:
5
The first element of Coloradothings is: wheat

The value of Coloradothings is:
['wheat', 'corn', 'sugar beets', 'pinto beans', 1959]


In [4]:
# The elements inside of Coloradothings may be of differing
# types ...
print('For Coloradothings[3] ...')
print(Coloradothings[3])
print(type(Coloradothings[3]))
print()
print('For Coloradothings[4] ...')
print(Coloradothings[4]) #is it the last element? Bingo!
print(type(Coloradothings[4]))

For Coloradothings[3] ...
pinto beans
<class 'str'>

For Coloradothings[4] ...
1959
<class 'int'>


In [11]:
# We can append to a list and insert in a list

Coloradothings.append("Amherst")#append adds an element to the end of the list
print(Coloradothings)
print()
Coloradothings.insert(2, "sunflowers")#insert adds an element at the specified index
print(Coloradothings)

['wheat', 'corn', 'sugar beets', 'pinto beans', 1959, 'Amherst']

['wheat', 'corn', 'sunflowers', 'sugar beets', 'pinto beans', 1959, 'Amherst']


<u>Tuples</u> are ordered, unchangeable, and allow duplicate members.

In [5]:
# Make a tuple. A tuple is an immutable list. Once you create it, you cannot change it.

Indianathings = ("Basketball", "Corn")

print(type(Indianathings))
print()
print(Indianathings)


<class 'tuple'>

('Basketball', 'Corn')


In [6]:
Indianathings.append("Wall street")# this will generate an error because tuples are immutable.

""" Note: The error here is an attribute error. It is not a syntax error."""

AttributeError: 'tuple' object has no attribute 'append'

Tuples can contain a single item but to do this we must specify them with a comma after the first and only element,  
 e.g., Indianathings = `("Basketball")` is not a tuple (it's a string), while Indianathings = `("Basketball",)` is a tuple.

Tuples cannot be changed. For example, if *Indianathings* is a tuple then *Indianathings.append("Wall street")* will cause an error.

<u>Sets</u> are unordered, changeable (in the sense that we can add and remove items from sets). Sets do not allow duplicates.

In [7]:
# Make a set.Sets are unordered collections of unique elements. As a result, they cannot be indexed.
Purduethings = {"Ag and Bio Engineering", "Ross-Ade Stadium", "students", "professors", "Gene Keady", "study sessions"}
print(type(Purduethings))
print(Purduethings) # Note the order it prints
print()

for x in Purduethings: # Note the order with which the for loop executes
    print(x)

print()
print("Ag and Bio Engineering" in Purduethings)
print("Medical School" in Purduethings)

<class 'set'>
{'Ag and Bio Engineering', 'Gene Keady', 'study sessions', 'professors', 'Ross-Ade Stadium', 'students'}

Ag and Bio Engineering
Gene Keady
study sessions
professors
Ross-Ade Stadium
students

True
False


From the code output above we note:
1. The order in which we included the set items when defining it is not the order that python used to enumerate the items when printing. Just FYI.
2. The statement in the last print command: `"Ag and Bio Engineering" in Purduethings` is a Boolean variable.

We can perform classical set operations (**union**, **intersection**, **difference**, **test subset**):

In [9]:
# Make another set ...
IUthings = {"Hoosiers", "Bobby Knight", "students", "professors", "parties"}
print('Purduethings union IUthings equals:')
print(Purduethings.union(IUthings)) #Union is the set of all elements in both sets.
print()
print('Purduethings intersection IUthings equals:')
print(Purduethings.intersection(IUthings))#Intersection is the set of elements in both sets.
print()
print('IUthings not also in Purduethings equals:')
print(IUthings.difference(Purduethings)) #Difference is the set of elements in the first set but not the second.
print()
print({"Gene Cernan",}.issubset(Purduethings)) #Is the set of elements in the first set a subset of the second set?
Purduethings.add("Gene Cernan")
print({"Gene Cernan",}.issubset(Purduethings)) #Is the set of elements in the first set a subset of the second set?



Purduethings union IUthings equals:
{'Bobby Knight', 'Gene Cernan', 'Gene Keady', 'study sessions', 'parties', 'Ross-Ade Stadium', 'professors', 'students', 'Ag and Bio Engineering', 'Hoosiers'}

Purduethings intersection IUthings equals:
{'students', 'professors'}

IUthings not also in Purduethings equals:
{'parties', 'Bobby Knight', 'Hoosiers'}

True
True


**Everyone knows  how to read English Dictionaries**  
<img align="left" src='Figs/English-dictionary.webp' width="450"/>
<img align="right" src='Figs\dictionary.jpg' width="350"/>


<u>Python Dictionaries</u> are unordered, changeable, and indexed. Written with "{}" but made up of key-value pairs.  
A key-value pair is a pair of strings separated by a colon. Different key-value pairs are separated by commas. It looks like {"key1": "value1", "key2: "value2"}.

In [10]:
# Make some dictionaries of farm equipment.
#declare 5 dictionaries with 3 key-value pairs each. Keys are brand, model, and year. Values are the values you choose. 
OldCombine = {"brand": "CASE", "model": "7130", "year": 2014} 
NewCombine = {"brand": "CASE", "model": "8240", "year": 2016}
Tractor1 = {"brand": "CASE", "model": "290", "year": 2013}
Pickup = {"brand": "CHEVY", "model": "Silverado", "year": 2005}
FavoriteOldCombineEver = {"brand": "JD", "model": "7720", "year": 1978, "color": "green"}

print(type(FavoriteOldCombineEver))
print(FavoriteOldCombineEver)

<class 'dict'>
{'brand': 'JD', 'model': '7720', 'year': 1978, 'color': 'green'}


**Note**: Dictionaries can contain dictionaries.

In [11]:
# Create a dictionary of farm equipment from the dictionaries of
# individual machines.

FarmEquipment = {"C1": OldCombine, "C2": NewCombine, "T1": Tractor1, "P1": Pickup, "C3": FavoriteOldCombineEver}
print(FarmEquipment)

{'C1': {'brand': 'CASE', 'model': '7130', 'year': 2014}, 'C2': {'brand': 'CASE', 'model': '8240', 'year': 2016}, 'T1': {'brand': 'CASE', 'model': '290', 'year': 2013}, 'P1': {'brand': 'CHEVY', 'model': 'Silverado', 'year': 2005}, 'C3': {'brand': 'JD', 'model': '7720', 'year': 1978, 'color': 'green'}}


**`Python provides a number of method for operating on strings ...`***

Methods are often written with dot notation wherein a string or string variable is followed by dot the method name and some optional paramters. Let `my_string` be a variable that holds a string. Some important methods are

* `my_string.find('xyz')` -- returns the index of the position in the original string where the first occurance of `'xyz'` is found, or -1 if never found

* `my_string.upper()` -- makes a new string which is the upper case version of the original. There is also a `.lower()`

* `my_string.replace('old','new')` -- replaces first occurance of substring `'old'` with substring `'new'`

* `my_string.split()` -- with no arguments it splits an input string at white space characters, e.g., space, tab, new line and produces a list containing the substrings.

There are many other string methods.

In [None]:
# The split method applies to the processing of text strings. Remember
# that a text string is a sequence of characters, sometimes called an array
# of characters.

# An example text string is one of the lines from the Limerick poem of the
# previous lab ...

ExampleString = 'There once was a fly on the wall'

# Check the type of the ExampleString variable
print("The data type of the ExampleString : ",type(ExampleString))

# How long is it?
print("The length of the ExampleString variable: ",len(ExampleString))

# What is the first element?
print("the first element of ExampleString",ExampleString[0])

# What is the last element?
print("the last element of ExampleString",ExampleString[31])

In [None]:
# Note that we don't have to define the variable in order to refer
# to the individual elements in this way ...

'There once was a fly on the wall'[5]

' '

The split method applies to a text string, splitting it into smaller strings demarcated by some marker, which can be specified. It puts the results into a list of smaller text strings. For example ...

`'There once was a fly on the wall'`

In [None]:
# If we split on a single space character ...

'There once was a   fly on the wall'.split(' ')

['There', 'once', 'was', 'a', '', '', 'fly', 'on', 'the', 'wall']

In [None]:
# If we split on any white space ... this doesn't count extra spaces ...
# White space refers to tabs and space characters. On a printed page of white paper
# there is therefore nothing there. White space might also include new lines ...

'There once was a   fly on the \t wall'.split()

['There', 'once', 'was', 'a', 'fly', 'on', 'the', 'wall']

In [None]:
len('There once was a   fly on the wall'.split())

8

#### *Discussion 3*: Operators

<u>Arithmetic operators </u>: +, -, *, /, %, **

In [14]:
# Arithmetic operators: +, -, *, /, %, **

print(7 + 5)  # addition
print(7 - 5)  # subtraction
print(7 * 5)  # multiplication
print(7 / 5)  # division
print(7 % 5)  # remainder upon integer division
print(7 ** 5) # exponentiation

12
2
35
1.4
2
16807


<u>Assignment operators</u>: =, +=, -=, *=, /=, **=

In [15]:
# Assignment operators: =, +=, -=, *=, /=, **=

b = 5
a = b
print(a)
a += b # shorthand for a = a + b
print(a)
a -= b # shorthand for a = a - b
print(a)
a *= b # shorthand for a = a*b
print(a)
a /= b # shorthand for a = a/b
print(a)
a **= b # shorthand for a = a**b
print(a)

5
10
5
25
5.0
3125.0


<u>Comparison operators</u>: ==, !=, <, <=, >, >=

In [15]:
# Comparison operators: ==, !=, <, <=, >, >=

a = 3
b = 2

print(f"a={a} is equal to b={b}. ",a == b)# note the double equal sign. This is a comparison operator. It returns a boolean value true or false.
print(a != b)# not equal to
print(a < b)# less than
print(a <= b)# less than or equal to
print(a > b)# greater than
print(a >= b)# greater than or equal to


a=3 is equal to b=2.  False
True
False
False
True
True


<u>Logical operators</u>: and, or, not

In [16]:
# Logical operators: and, or, not

x = (a == b) # The expression a == b it a Boolean value (either True or False).
             # The assignment creates a Boolean variable x
print(type(x))
print(x)

print()

y = not(x)# The not operator negates the Boolean value of x
print(type(y))
print(y)

print()

z = True

print(x or z)# The or operator returns True if either x or z is True
print(x and z)# The and operator returns True if both x and z are True


<class 'bool'>
False

<class 'bool'>
True

True
False


#### *Discussion 4*: Control statements
There are three methods of program control that we consider here:

1. For loops
2. While loops   
3. If/else statement

In [19]:
# While loop: Execute while condition is true.

i = 1
while i < 6:
    print(i)
    i += 1

1
2
3
4
5


In [20]:
# For loop: Iterate over a sequence. Also, have break (stop a loop where it is 
# and exit) and continue (move to the next iteration of loop).

for x in "banana":
    print(x)
    
print("\n")    
print("Try continue command")
print("\n")

for x in "banana":
    if x == "n":
        continue
    print(x)

b
a
n
a
n
a


Try continue command


b
a
a
a


In [18]:
# Example if/else statement

a = 5;
b = 3;
if b > a:
    print("b is greater than a")
elif a == b:
    print("a and b are equal")
else:
    print("a is greater than b")

a is greater than b


#### *Discussion 5*: Functions
   
Function are blocks of code that run when called. 

- Can pass parameters to a function. 
- A function can return a value (routine is similar to function but it does not return a value)

Functions allow code to be more readable by allowing the hiding of details of an operation that may not be central to the understanding of the overall algorithm. Sometimes, this is called encapsulation. For example, perhaps we want to solve some sort of geometric problem, such as finding the height of a tree from the angle of the sun and the length of the shadow cast on the ground. The height calculation will involve intermediate calculations of trigonometric functions of the angle (e.g., sine, cosine, tangent). These sorts of intermediate calculations are naturally left to functions in python and other programming languages.

In addition, functions ...

- Assist in divide and conquer problem solving.
- Allow to reuse the function code in other parts of a larger program.  

Python has a list of built in functions you can refer to at [ Built-in Functions](https://docs.python.org/3/library/functions.html)

In [11]:
i=10
print("the range of values in I ", range(i)) #here print and range are Python's built-in functions

the range of values in I  range(0, 10)


In [13]:
# Create two lists
crop = ["corn", "winter-wheat", "soybean"]
cropcycle = [100, 240, 120]

# Use zip() to combine the lists element-wise. zip is a built in function
combined = zip(crop, cropcycle)

# Convert the result to a list (or another iterable, e.g., tuple)
combined_list = list(combined) # list is a function that converts its argument to a list

# Display the combined result
print(combined_list)

[('corn', 100), ('winter-wheat', 240), ('soybean', 120)]


**Write your own function : find the square root**

According to wikipedia this algorithm goes back to the Babylonians (100 AD) and is widely used for computing square roots by hand. The idea is this. If we want to find the square root of a positive number, say $Z$, we first start with a guess $x$ hoping $x^2 \approx Z$. Now if the original guess is too large, i.e., $x^2 > Z$ then $x > Z/x$ and so we could move in the correct direction (towards smaller values of $x$) by making a new guess equal to the average of $x$ and $Z/x$, i.e.,

New guess = $(x + Z/x)/2$.

If, on the other hand, the original quess was too small, i.e., $x^2 < Z$ then $x < Z/x$ and using the above formula for the new guess would move in the correct direction of larger values. The algorithm is implemented in python in the function code below.

**Hand calculation example ...**  

Say Z = 10 and guess x = 3 for the square root. Then the next guess is the average of 3 and 3.3333..., which is approximately 3.16666... The next step in the algorithm gives an estimate of

3.1622

In [21]:
Z = .89
x = [1, 1, 1, 1, 1, 1, 1, 1] #this is a list,being used as a counter.
N = len(x) #len returns the length of its argument list
print(x[0])
i = 1
while i < N: #while is a loop that executes the code in its body while the condition is true, i.e., while i < N(=8)
    x[i] = (x[i-1] + Z/x[i-1])/2 #this is the Newton-Raphson method for finding the square root of Z
    print(x[i])#print the value of x[i] on each iteration to see how it converges.
    i = i+1 #increment i by 1

1
0.9450000000000001
0.9433994708994708
0.9433981132066374
0.9433981132056604
0.9433981132056604
0.9433981132056604
0.9433981132056604


**Now write a function for the square root calculation**  

Could you comment this code?


In [15]:
# Z is the positive number for which we want the square root. 
# epsilon is the tolerance in the accuracy of the result.
# The function returns the square root of Z.

def Newtroot(Z,epsilon):
    x = 1
    xp = (x + Z/x)/2
    e = (xp - x)/x
    while (e > epsilon) or (-epsilon > e):
        x = xp
        xp = (x + Z/x)/2
        e = (xp - x)/x
    return xp

**How to use functions?**  
functions are generally written within loops where the loops iterate over values which changes the function arguments. 

In [22]:
# Example of the square root algorithm for multiples of 10
listofsquareroots = [] # create an empty list to store the square roots
epsilon = 1e-12 # tolerance for the square root algorithm
for numbers in range(1,100,10): #the range(start, end, step) is used to every 10th number between 1-100. 
    z=Newtroot(numbers,epsilon) #call the function Newtroot
    listofsquareroots.append(z)# append the return value to a list. This will store the values each time the loop runs
print("the square root of the multiples of 10 are: ",listofsquareroots)


the square root of the multiples of 10 are:  [1.0, 3.3166247903554, 4.58257569495584, 5.567764362830022, 6.4031242374328485, 7.14142842854285, 7.810249675906654, 8.426149773176359, 9.0, 9.539392014169456]


**Would you like to visualize each step of this function?**  
you must try copying to the box and follow the example on [Pythontutor](https://pythontutor.com/). Paste the above code to visualize the newtroot function, here [visualizer/visual debugger](https://pythontutor.com/render.html#mode=display).  


![CodevisualizerExample.PNG](attachment:CodevisualizerExample.PNG)