## Python Basics

#### Become familiar with the `indexes` data set.

### SECTION 1: SETUP

1. Import the `pandas` package with the common `pd` alias.
2. Use `help()` to explore the pandas `read_csv()` function.
3. Use the pandas `read_csv()` function to load the data set.
4. Save the data set to a variable named `indexes`.
5. Save the `symbol`, `date`, and `adjusted` columns to variables.

In [1]:
# Import packages used
import pandas as pd

In [2]:
# Understand how to use the read_csv() function
# help(pd.read_csv)

# Save the data set to a variable for easy referencing
indexes = pd.read_csv("/Users/kellsworth/Developer/GitHub/python-dabbling/posit-academy/src/files/indexes.csv")

# Save symbol, date, and adjusted columns to variables
symbol = indexes['symbol']
date = indexes['date']
adjusted = indexes['adjusted']

###  SECTION 2: EXPLORE DATA

1. Display the contents of `indexes`.
2. Display the contents of the columns saved to variables.
3. Use the built-in `min()` function to find the first date in date.
4. Use the built-in `max()` function to find the last day in date.
5. Use the pandas `value_counts()` function to count the number of records for each symbol.

In [4]:
# Display the content of the full data set
print(indexes)

# Display the content of the individual columns
print(symbol)
print(date)
print(adjusted)

# Understand how to use the value_counts() function
# help(pd.value_counts)

# Run analysis against data
print("First day: ", min(date))
print("Last day: ", max(date))
print("Count per symbol: ", "\n", pd.Series(symbol).value_counts())

     symbol        date         open         high          low        close  \
0     ^GSPC  2021-01-04  3764.610107  3769.989990  3662.709961  3700.649902   
1     ^GSPC  2021-01-05  3698.020020  3737.830078  3695.070068  3726.860107   
2     ^GSPC  2021-01-06  3712.199951  3783.040039  3705.340088  3748.139893   
3     ^GSPC  2021-01-07  3764.709961  3811.550049  3764.709961  3803.790039   
4     ^GSPC  2021-01-08  3815.050049  3826.689941  3783.600098  3824.679932   
...     ...         ...          ...          ...          ...          ...   
1507   ^TNX  2021-12-27     1.489000     1.494000     1.476000     1.481000   
1508   ^TNX  2021-12-28     1.486000     1.489000     1.455000     1.481000   
1509   ^TNX  2021-12-29     1.505000     1.558000     1.505000     1.543000   
1510   ^TNX  2021-12-30     1.531000     1.550000     1.513000     1.515000   
1511   ^TNX  2021-12-31     1.505000     1.522000     1.491000     1.512000   

          volume     adjusted  
0     5015000000  3

### SECTION 3: EXTENSION

Expand on your work above by investigating something in the data set that interests you.

In [6]:
# Create copy of data frame to preserve original
indexes_extension = indexes

# Add new columns to data frame
indexes_extension["daily_chng_val"] = (indexes_extension["close"] - indexes_extension["open"])
indexes_extension["daily_chng_pct"] = (indexes_extension["daily_chng_val"] / indexes_extension["open"])

# Understand how to use the pandas groupby() function
# help(indexes_extension.groupby)

# Analyze how symbols may fluctuate from each other
agg_extension = indexes_extension.groupby("symbol")[["open", "close", "daily_chng_val", "daily_chng_pct"]].mean()
print("Averages by symbol: ", "\n", agg_extension)

Averages by symbol:  
                 open         close  daily_chng_val  daily_chng_pct
symbol                                                            
^DJI    34039.094789  34055.289690       16.194902        0.000509
^FVX        0.857897      0.857722       -0.000175        0.000345
^GSPC    4271.170712   4273.385635        2.214922        0.000542
^IRX        0.035258      0.034615       -0.000643       -0.005619
^IXIC   14371.278057  14371.662404        0.384347        0.000063
^TNX        1.440988      1.440452       -0.000536       -0.000046
