# Always run all cells, starting with module imports such as in the cell below

In [None]:
import numpy as np
from datascience import * 

## Learning from Lab 00/01
Working with Jupyter Notebooks
- Expressions
- Variables & upcoming Data Types
    - Strings
    - Integer
    - Float
- built-in functions: print("Hello World")
- Kernel
- errors
- check work with check('tests/q2_1.py')
- Shortcuts
- Exporting your notebook to HTML
- Running the tests
- Code vs Markdown cells

### Expressions

Combinations of an operator (+, -, *, /, and, or, etc.) and an operand (number, string, etc.) to produce some output.

In [None]:
# This code below is an expression
c = 212 / 31 + 6

### Variables
#### Strings

Strings of symbols, including spaces.

In [None]:
first_name = "Octavius"
last_name = "Catto"

print(first_name, last_name)

Numbers can be in strings too, but operators don't interpret them mathematically. 

In [None]:
str_1 = "1"
str_2 = "2"

print(str_1 + str_2)
type(str_1 + str_2)

#### Integers

In [None]:
two = 2
two + two

In [None]:
type(two)

### Floating point numbers

In [None]:
approx_pi = 22 / 7
print(approx_pi)
print(type(approx_pi))

### Built-in print() function.

In [None]:
print("Welcome to Elements of Data Science!")

## Upcoming Learning Goals from Lab 02
Data Types: See video of same name
- Expressions & numbers revisited
- Variables & Data Types
    - Strings
    - Integer
    - Float
- more built-in functions: max(), min(), abs(), pow(), round()
- Translating science into Python formulas: Newton's equations
- importing code functionality: import math
    - math.pi()
    - math.sqrt()
    - math.log()
    - math.factorial()
- parentheses and order of operations (PEMDAS)
- arrays: from datascience import *
    - make_array(0.125, 4.75, -1.3)
- arrays: import numpy as np
    - np.array([0, 1, -1, math.pi, math.e])
- lists: [1, 2, 3, 4]

### More built-in functions

In [None]:
maximum_number = max(3,4.4)
maximum_number

### Importing modules to add functionality

In [None]:
import math
radius = 4
area_of_circle = math.pi * radius**2 
print(area_of_circle)

In [None]:
math.e

In [None]:
area_of_circle = radius**2 * math.pi
area_of_circle

### Different ways to import and the risk in using import *

In [None]:
# Plain, vanilla import
import math

# Must preface functions with the whole module name
math.pi

In [None]:
# Renaming, often to abbreviate, on import
import math as m

# Now you preface functions with the shorter name
m.pi

In [None]:
# Bringing all of the modules functions into your working namespace
from math import *

# Now you don't need any preface to call the functions
pi

In [None]:
# The danger of import *
# What if I have a variable name pi
pi = "My favorite is pumpkin!"
pi

In [None]:
# Now I import the math module
from math import *
pi

In [None]:
# The math module ate my pi!
# If I import the other ways, the variables remain distinct
import math
pi = "My favorite is pumpkin!"
print(pi)
print(math.pi)

# The morale of the story is be careful with import * or you might lose your pi!

In [None]:
# You can check to see what names will be imported
import math
dir(math)

Learning how different functions behave is an important part of learning a programming language. A Jupyter notebook can assist in remembering the names and effects of different functions. When editing a code cell, press the tab key after typing the beginning of a name to bring up a list of ways to complete that name. For example, press tab after math. to see all of the functions available in the math module. Typing will narrow down the list of options. To learn more about a function, place a ? after its name. For example, typing math.log? will bring up a description of the log function in the math module.

In [None]:
math.

In [None]:
math.log?

### Parenthesis () and PEMDAS
In standard math notation, the first expression below is

$$6 + 6 \times 5 - 6 \times 3^2 \times \frac{2^3}{4} \times 7,$$

while the second expression below is

$$6 + (6 \times 5 - (6 \times 3))^2 \times (\frac{(2^3)}{4} \times 7).$$



In [None]:
6 + 6*5 - 6 * 3**2 * 2**3 / 4 * 7

In [None]:
6 + (6*5 - (6*3))**2 * ((2**3)/4 * 7)

Think about it -- which of these parenthesis are necessary?

### The datascience module

In [None]:
import datascience
dir(datascience)

In [None]:
# We use this in our class because not that many names are imported,
# and they are names we are unlikely to use ourselves, so the 
# convenience outweighs the risk
from datascience import *

### Arrays
Arrays are sequences of data stored in a variable. All the data in the sequence must be the same data type.

In [None]:
x = make_array([2, 4, 6, 8])

In [None]:
names = make_array(["tacos", "burritoes", "enchiladas"])
names

In [None]:
# Arrays are so useful because we can operate on all of the elements in an array at once
x + 10

In [None]:
# Square every element in the array
y = x**2
y

### Lists
Lists are another Python data type that can store a sequence, but the elements of a list do not all have to be the same data type. You cannot operate on all of the elements of a list, like with an array, but they can be useful when your data is heterogeneous.

In [None]:
sample_list = ['David', 'biked', 50, 'miles']
sample_list

In [None]:
# Notice that we make arrays by passing a list comprised of one type of data to the make_array() function
x = make_array([2, 4, 6, 8])
x

## Numpy (short for Numerical Python) 
Numpy is a module that add a lot of functionality for working with array. As scientists, we will us it in virtually every Jupyter notebook. In fact, it is used so commonly that everyone imports it using the abbreviated name np.

In [None]:
import numpy as np # numpy is almost always imported as np

In [None]:
# You can also use numpy to create arrays
key_math_constants = np.array([0, 1, -1, math.pi, math.e])

In [None]:
# importing numpy adds hundreds of functions.
# Here is one example that adds up all the elements of the array
np.sum(key_math_constants)

## A sneak peak at the power of Python

In [None]:
!pip install meteostat #call to outside this kernel (!) to install the meteostat package with pip

In [None]:
# Import Meteostat library and dependencies
from datetime import datetime
%matplotlib inline
import matplotlib.pyplot as plt
from meteostat import Point, Daily

In [None]:
# Set time period
start = datetime(2023, 1, 1)
end = datetime(2024, 9, 1)
# Create Point for Philadelphia? 39.981, -75.153
location = Point(39.981, -75.153, 10)

In [None]:
# Get daily data for 2023-4
data = Daily(location, start, end)
data = data.fetch()
data

In [None]:
# Plot line chart including average, minimum and maximum temperature
data.plot(y=['tavg', 'tmin', 'tmax'])
plt.show()

In [None]:
data.plot(y=['tavg'])
plt.show()

In [None]:
import numpy as np
from scipy.optimize import curve_fit

def sine_func(x, A, f, phi, offset):
    return A * np.sin(2 * np.pi * f * x + phi) + offset

In [None]:
x = data.index.astype(int) / 10**9/ 3600 / 24 / 365
y = data["tavg"]

# Fitting the curve
initial_guess = [15.0, 1.0, 0, 15.0]  # Initial guess for [A, T, phi, offset]
popt, _ = curve_fit(sine_func, x, y, p0=initial_guess)
y_fit = sine_func(x, *popt)

plt.figure(figsize=(10, 6))
plt.scatter(data.index, y, label='Data')
plt.plot(data.index, y_fit, 'r-', label='Fitted Curve')
plt.legend()
plt.xlabel('Date')
plt.ylabel('Temperature (deg C)')
plt.title('Sine Curve Fit')
plt.show()

In [None]:
# Set time period
start = datetime(2019, 1, 1)
end = datetime(2024, 9, 1)

# Create Point for Philadelphia? 39.981, -75.153
location = Point(39.981, -75.153, 10)

# Get daily data for 2023-4
data = Daily(location, start, end)
data = data.fetch()

In [None]:
x = data.index.astype(int) / 10**9/ 3600 / 24 / 365
y = data["tavg"]

# Fitting the curve
initial_guess = [15.0, 1.0, 0, 15.0]  # Initial guess for [A, T, phi, offset]
popt, _ = curve_fit(sine_func, x, y, p0=initial_guess)
y_fit = sine_func(x, *popt)

plt.figure(figsize=(10, 6))
plt.scatter(data.index, y, label='Data')
plt.plot(data.index, y_fit, 'r-', label='Fitted Curve')
plt.legend()
plt.xlabel('Date')
plt.ylabel('Temperature (deg C)')
plt.title('Sine Curve Fit')
plt.show()

In [None]:
# Full temperature swing (twice amplitude)
# in degrees Farenheit
temperature_swing = 2 * popt[0] * 1.8

round(temperature_swing, 1)