# Week 1 Lesson 1 Python Review

Short review of core concepts in Python exemplified by objects in the Numpy library.

- Recall basic vocabulary
- Practice markdown syntax

## Libraries and packages

**Library:** collection of code that we can use to perform a specific task in our programs. It can be one or multiple files.

**NumPy:**

- Core library for numerical computing
- Many libraries use NumPy arrays as building blocks
- Computations on NumPy objects are optimized for speed an memory usage

Let's import NumPy with its **standard abbreviation** `np`:

In [1]:
import numpy as np

## Variables

**Variable:** a name we assign to a particular object in python

Example:

In [3]:
# Assign a small array to a variable a
a = np.array([[1, 1, 2], [3, 5, 8]])

To view a variables value from jupyer nb:

In [4]:
# Run cell with variable name to show value
a

array([[1, 1, 2],
       [3, 5, 8]])

In [5]:
# Use `print` function to print value
print(a)

[[1 1 2]
 [3 5 8]]


# Pandas Series and Data Frames

In [2]:
import pandas as pd
import numpy as np

The first core object of pandas is the series. A series is a one-dimensional array of indexed data.

A pandas.Series having an index is the main difference between a pandas.Series and a NumPy array. Let’s see the difference:

In [2]:
# A numpy array
arr = np.random.randn(4) # random values from std normal distribution
print(type(arr))
print(arr, "\n")

# A pandas series made from the previous array
s = pd.Series(arr)
print(type(s))
print(s)

<class 'numpy.ndarray'>
[-2.46016621  0.63789792 -0.51552152  0.81907695] 

<class 'pandas.core.series.Series'>
0   -2.460166
1    0.637898
2   -0.515522
3    0.819077
dtype: float64


## Creating a pandas.Series

The basic method to create a pandas.Series is to call


`s = pd.Series(data, index=index)`

The data parameter can be:

a list or NumPy array,
a Python dictionary, or
a single number, boolean (True/False), or string.

In [4]:
# A series from a numpy array 
pd.Series(np.arange(3), index=[2023, 2024, 2025])

2023    0
2024    1
2025    2
dtype: int64

In [1]:
# A series from a list of strings with default index
pd.Series(['EDS 220', 'EDS 222', 'EDS 223', 'EDS 242'])

NameError: name 'pd' is not defined

In [9]:
# Panda series from a dictionary
# Construct dictionary
d = {'key_0':2, 'key_1':'3', 'key_2':5}

# Initialize series using a dictionary
pd.Series(d)

key_0    2
key_1    3
key_2    5
dtype: object

In [8]:
# Panda series from a single value
pd.Series(3.0, index = ['A', 'B', 'C'])

A    3.0
B    3.0
C    3.0
dtype: float64

## Simple operations


In [10]:
# Define a series
s = pd.Series([98,73,65],index=['Andrea', 'Beth', 'Carolina'])

# Divide each element in series by 10
print(s /10, '\n')

# Take the exponential of each element in series
print(np.exp(s), '\n')

# Original series is unchanged
print(s)


Andrea      9.8
Beth        7.3
Carolina    6.5
dtype: float64 

Andrea      3.637971e+42
Beth        5.052394e+31
Carolina    1.694889e+28
dtype: float64 

Andrea      98
Beth        73
Carolina    65
dtype: int64


We can also produce new pandas.Series with True/False values indicating whether the elements in a series satisfy a condition or not:

In [11]:
s > 70

Andrea       True
Beth         True
Carolina    False
dtype: bool

## Identify missing values

In [12]:
# Series with NAs in it
s = pd.Series([1, 2, np.nan, 4, np.nan])
s

0    1.0
1    2.0
2    NaN
3    4.0
4    NaN
dtype: float64

In [13]:
# Check if series has NAs
s.hasnans

True

In [14]:
s.isna()

0    False
1    False
2     True
3    False
4     True
dtype: bool

# Data Frames

## Creating a pandas.DataFrame
There are many ways of creating a pandas.DataFrame. We present one simple one in this section.

We already mentioned each column of a pandas.DataFrame is a pandas.Series. In fact, the pandas.DataFrame is a dictionary of pandas.Series, with each column name being the key and the column values being the key’s value. Thus, we can create a pandas.DataFrame in this way:

In [15]:
# Initialize dictionary with columns' data 
d = {'col_name_1' : pd.Series(np.arange(3)),
     'col_name_2' : pd.Series([3.1, 3.2, 3.3]),
     }

# Create data frame
df = pd.DataFrame(d)
df

Unnamed: 0,col_name_1,col_name_2
0,0,3.1
1,1,3.2
2,2,3.3


In [16]:
# Change index
df.index = ['a','b','c']
df

Unnamed: 0,col_name_1,col_name_2
a,0,3.1
b,1,3.2
c,2,3.3


## Check in exercise

The integer number -999 is often used to represent missing values. Create a pandas.Series named s with four integer values, two of which are -999. The index of this series should be the the letters A through D.

In the pandas.Series documentation, look for the method mask(). Use this method to update the series s so that the -999 values are replaced by NA values. HINT: check the first example in the method’s documentation.

In [7]:
# s = pd.Series([24, -999, 3, -999])
# s = pd.DataFrame(s)
# s.index = ['a', 'b', 'c', 'd']

In [9]:
s = {'value' : pd.Series([24, -999, 3, -999])}
s = pd.DataFrame(s)
s.index = ['A', 'B', 'C', 'D']
s

Unnamed: 0,value
A,24
B,-999
C,3
D,-999


In [14]:
s.mask(s == -999, "NA")

Unnamed: 0,value
A,24.0
B,
C,3.0
D,
