# Introduction to Python

Basics of the Python standard library and Numpy.

## Standard Library data types:
* Single values:
  * Numeric (integers, floating point numbers)
  * Boolean
* Iterable objects:
  * Strings
  * Lists
  * Sets
  * Tuples
  * Dictionaries
  * Ranges

### Numeric data types

Numeric values can take on integer (int) or floating point (float) types.  Python will infer the specific type based on the value assigned. For example:

In [None]:
#  Create a numeric object
three = 3
print("The type of three is: ", type(three))

In [None]:
# Basic math operations are part of the standard library
two = 2
three * two

6

### Booleans

Booleans are True/False values.  They are often used in conditional statements and loops.  You can do math with boolean values, with `True` being treated as `1` and `False` as `0`. For example:

In [65]:
# Booleans
is_true = True
is_false = False
print("The object `is_true` is of type: ", type(is_true))
print("The object `is_false` is of type: ", type(is_false))

The object `is_true` is of type:  <class 'bool'>
The object `is_false` is of type:  <class 'bool'>


In [8]:
# math with booleans
2 * is_false

0

## "Iterables"

Iterables are Python objects with multiple values, you can iterate over the elements in an iterable (hence the name).  You can also usually reference a specific value in the iterable.

### Strings

Strings are sequences of characters, you can iterate over each character in a string.

In [66]:
# Example of a name
name = "John"
print("The type of name is: ", type(name))
print("The first letter of name is: ", name[0])

The type of name is:  <class 'str'>
The first letter of name is:  J


In [None]:
# the for/in syntax can be used to iterate over the elements in an iterable
for letter in name:
    print(letter)
# Note that the print command below is only printed once,
# since it's not indented under the for loop (try indenting and see what happens)
print("the type of the object is ", type(name))

J
the type of the object is  <class 'str'>
o
the type of the object is  <class 'str'>
h
the type of the object is  <class 'str'>
n
the type of the object is  <class 'str'>


### Lists

Lists are containers of objects.  The objects can be of mixed types.  Lists are "mutable" (i.e., they can be altered after created). Lists are defined with square brackets `[]`.

In [None]:
our_list = ["Ayse", 3, True, list(("Hello", "Goodbye"))]
print("The type of our_list is: ", type(our_list))
print(our_list)

['Ayse', 3, True, ['Hello', 'Goodbye']]


In [None]:
# Values in a list can be referenced by their index
# REMEMBER: Python uses 0-indexing, so the first value is at index 0
our_list[2] # 3rd value

True

In [None]:
# Lists are mutable, meaning we can modify an element in a list
# Here we change the 3rd element (index 2) from True to False
our_list[2] = False
print(our_list)

['Ayse', 3, False, ['Hello', 'Goodbye']]


### Sets

Sets are unordered collections of unique elements. They are useful when you want to store multiple items without duplicates.  Sets are defined using curly braces `{}`.

In [None]:
our_set = {"1", 1, "one", 1, 2.2}
print("What type is our_set? ", type(our_set))
print(our_set)  # Note how the duplicate 1 (integer)is not included

What type is our_set?  <class 'set'>
{'one', '1', 2.2, 1}


In [None]:
# We can iterate over a set
for item in our_set:
    print(item)

one
1
2.2
1


In [69]:
# We cannot reference an element of a set by index
our_set[0]  # This will give an error

TypeError: 'set' object is not subscriptable

### Tuples

Tuples are similar to lists, but they are immutable (i.e., they cannot be changed after creation). Tuples are defined using parentheses.


In [70]:
our_tuple = ("Ayse", 3, True, list(("Hello", "Goodbye")))
print("What type is our_tuple? ", type(our_tuple))
print(our_tuple)

What type is our_tuple?  <class 'tuple'>
('Ayse', 3, True, ['Hello', 'Goodbye'])


In [71]:
# Since they are immutable, we can't change an element of a tuple
our_tuple[2] = False  # This will give an error
print(our_tuple)

TypeError: 'tuple' object does not support item assignment

### Dictionary

Python dictionaries are mutable, unordered collections of key-value pairs. They are defined using curly braces, with keys and values separated by colons.

Dictionaries are my favorite data structure because they allow for fast lookups and can be easily modified and can store highly complex data structures.

In [72]:
USC_football = {
    "year": [2022, 2023, 2024],
    "wins": [8, 5, 9]
}
print("What type is USC_football? ", type(USC_football))
print(USC_football)

What type is USC_football?  <class 'dict'>
{'year': [2022, 2023, 2024], 'wins': [8, 5, 9]}


In [73]:
# We can reference an element of a dictionary with its keyword
# Here we will reference the list of wins and pull the 2nd element
USC_football["wins"][1]  # Wins in 2023

5

# Range

A range object is a series of numbers that can be used in for loops or to create lists. It is defined using the `range()` function.

In [42]:
a_range = range(5)
print("What type is a_range? ", type(a_range))
print(a_range)

What type is a_range?  <class 'range'>
range(0, 5)


### Math with iterables

You can do math with the standard library objects, but you don't always get what you expect.

In [None]:
# "Multiplying" a string repeats it
name * 2

'JohnJohn'

In [74]:
# The same is true of lists -- even if of all numbers
numeric_list = [1, 2, 3]
numeric_list * 2

[1, 2, 3, 1, 2, 3]

If we want to do math on numeric_list, we need to use a loop or a list comprehension.
e.g., :
```python
doubled_list = [x * 2 for x in numeric_list]
```
But we might also want to just use put the list into an object that more easily accomodates math operations, such as a NumPy array.

# The Numpy Library

This is a powerful library for numerical computing in Python. It provides support for arrays, matrices, and a wide range of mathematical functions.

In [None]:
# import the Numpy package and give it
# the alias `np`
import numpy as np

In [None]:
# Create a numpy array with the np.array() function
our_array = np.array([1, 2, 3, 4, 5])
print("What type is our_array? ", type(our_array))
print("What is the shape of our_array? ", our_array.shape)
print(our_array)

What type is our_array?  <class 'numpy.ndarray'>
What is the shape of our_array?  (5,)
[1 2 3 4 5]


In [75]:
# Basic operations on our_array
our_array * 2

array([ 2,  4,  6,  8, 10])

In [76]:
# numpy arrays use element by element operations by default
print("Product of the arrays: ", our_array * our_array)
print("Sum of the arrays: ", our_array + our_array)

Product of the arrays:  [ 1  4  9 16 25]
Sum of the arrays:  [ 2  4  6  8 10]


In [79]:
our_other_array = np.array([6, 7, 8, 9, 10, 11])
our_array * our_other_array  # This will give an error because the
# shapes are different (a "broadcasting" error, which means the arrays
#don't conform")

ValueError: operands could not be broadcast together with shapes (5,) (6,) 

In [80]:
# Numpy has functions for vectorized operations
# We can do the vector product of our_array with itself as
np.dot(our_array, our_array)  # Dot product

55

In [55]:
# N-dimensional arrays are possible
# e.g., 3 dims
d3_array = np.array([[[ 1, 2, 3],
                       [ 4, 5, 6]],
                      [[ 7, 8, 9],
                       [10, 11, 12]]])
print("What is the shape of d3_array? ", d3_array.shape)
print(d3_array)

What is the shape of d3_array?  (2, 2, 3)
[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]


In [82]:
# Array "slicing"
# slice arrays with the ":"
d3_array[1, :, :]

array([[ 7,  8,  9],
       [10, 11, 12]])

In [None]:
# Array "slicing"
# a number before the colon means start at that element
# a number after the colon means end at that element (not including that element)
d3_array[1, 0:1, :]  # here we are selecting the first "slice" of the second dimension

array([[7, 8, 9]])

In [2]:
# Special arrays
import numpy as np
a = np.zeros((3, 4))  # 3 rows, 4 columns
print(a)


[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]


In [4]:
b = np.ones((2, 3))  # 2 rows, 3 columns
print(b)

[[1. 1. 1.]
 [1. 1. 1.]]


In [5]:
I = np.eye(3)  # 3x3 identity matrix
print(I)


[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


In [6]:
c = np.empty_like(a)
print(c)


[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]


In [7]:
d = np.ones_like(a)
print(d)

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]


# Pandas

Pandas is the main library for data manipulation and analysis in Python.  The main object in Pandas is the DataFrame, which is a 2-dimensional labeled data structure with columns of potentially different types.

In [8]:
import pandas as pd

In [9]:
football_dict = {
    "Year": [2022, 2023, 2024],
    "USC": [8, 5, 9],
    "Texas": [8, 12, 13],
    "UGA": [15, 13, 11]
}

In [10]:
df = pd.DataFrame(football_dict)
print("What type is df? ", type(df))

What type is df?  <class 'pandas.core.frame.DataFrame'>


In [11]:
df

Unnamed: 0,Year,USC,Texas,UGA
0,2022,8,8,15
1,2023,5,12,13
2,2024,9,13,11


In [12]:
df.describe()

Unnamed: 0,Year,USC,Texas,UGA
count,3.0,3.0,3.0,3.0
mean,2023.0,7.333333,11.0,13.0
std,1.0,2.081666,2.645751,2.0
min,2022.0,5.0,8.0,11.0
25%,2022.5,6.5,10.0,12.0
50%,2023.0,8.0,12.0,13.0
75%,2023.5,8.5,12.5,14.0
max,2024.0,9.0,13.0,15.0


In [13]:
df["USC"]

0    8
1    5
2    9
Name: USC, dtype: int64

In [15]:
df["USC"].mean()

7.333333333333333

In [16]:
!ls

ACME_NumpyIntro.pdf             PS2.fdb_latexmk
ACME_ObjectOriented.pdf         PS2.fls
ACME_Pandas1.pdf                PS2.log
ACME_Pandas2.pdf                PS2.out
ACME_Pandas3.pdf                PS2.pdf
ACME_Pandas4.pdf                PS2.synctex.gz
ACME_StandardLibrary.pdf        PS2.tex
BuiltinTypes.ipynb              PythonBasics.ipynb
[34mDataFiles[m[m                       PythonDescribe.ipynb
DataFunctions.ipynb             PythonFuncs.ipynb
InClassExample.ipynb            PythonNumpyPandas.ipynb
InClass_BasicLibraryNumpy.ipynb PythonReadIn.ipynb
InClass_Functions.ipynb         PythonReshape.ipynb
InClass_Pandas.ipynb            README.md
InClass_PythonBasics.ipynb      Untitled.ipynb
Notes.aux                       Untitled1.ipynb
Notes.fdb_latexmk               [34m__pycache__[m[m
Notes.fls                       fibo.py
Notes.log                       kisa_2015.csv
Notes.out                       kisa_df.pkl
Notes.pdf                       my_func.py
Notes.synctex.gz

In [17]:
kisa_df = pd.read_csv("kisa_2015.csv")

In [18]:
kisa_df.head()  # show the first 5 rows of the dataframe

Unnamed: 0,month,grdatn,marstat,age,class,region,state,hours,mlr,natvty,...,homeown,hoursu1b,hoursu1b_t1,se15u,se15u_t1,ent015u,ent015ua,vet,wgtat,wgtat1
0,12,42,5,57,4,1,14,40,1,57,...,,40,40,0,0,0.0,0.0,0,269.172442,270.433824
1,12,39,7,26,4,1,14,40,1,57,...,,40,40,0,0,0.0,0.0,0,403.023478,404.912105
2,12,41,1,43,4,2,41,46,1,110,...,,46,40,0,0,0.0,0.0,0,402.790075,404.677609
3,12,39,1,38,4,2,41,40,1,57,...,,40,30,0,0,0.0,0.0,0,342.934489,344.541531
4,12,42,1,51,-1,3,58,-1,6,57,...,,-1,-1,0,0,0.0,0.0,0,560.224448,562.849743


In [26]:
# group by age and report mean hours - and put in a dataframe
df2 = kisa_df.groupby("age")["hours"].mean().reset_index()
df2

Unnamed: 0,age,hours
0,20,40.0
1,21,38.0
2,22,34.666667
3,23,17.5
4,24,-1.0
5,26,40.0
6,28,19.4
7,29,16.0
8,30,39.0
9,32,19.5
