In [11]:
### Covered in this section are the following concepts:
# Reading data (CSV, Excel)
# Viewing and Inspecting data

### Library Imports
import pandas as pd
import yfinance as yf


### Reading and Writing Data

1. **Reading Data**
   - `pd.read_csv('filepath', sep=',', header='infer', usecols=None, dtype=None, parse_dates=False)` - Read data from a CSV file.
   - `pd.read_excel()` - Read data from an Excel file.
   - `pd.read_json()` - Read data from a JSON formatted string or file.
   - `pd.read_sql()` - Read data from a SQL database query.
   - `pd.read_html()` - Extract data from HTML tables, typically used for web scraping.

2. **Writing Data**
   - `DataFrame.to_csv('name_of_project.csv', sep=',', columns=None, header=True, index=False)` - Write DataFrame to a CSV file.
   - `DataFrame.to_excel()` - Write DataFrame to an Excel file.
   - `DataFrame.to_json()` - Write DataFrame to a JSON formatted string or file.
   - `DataFrame.to_sql()` - Write DataFrame to a SQL database table.


In [12]:
### Reading and Writing Data
# Read in stock data from .csv
stock_data = pd.read_csv('/Users/mburley/Downloads/stock_data.csv')
# Output stock data to .csv
stock_data.to_csv('stock_data_check.csv', index = False)

#### Data Inspection and Cleaning 
1. **Viewing Data**
   - `DataFrame.head()` - View the first few rows of the DataFrame.
   - `DataFrame.tail()` - View the last few rows of the DataFrame.
   - `DataFrame.info()` - Get a concise summary of the DataFrame, including column types and non-null values.
   - `DataFrame.describe()` - Generate descriptive statistics that summarize the central tendency, dispersion, and shape of a dataset’s distribution.

In [13]:
### Data Inspection and Cleaning 

# Print all df info
print(stock_data.info())
# Print Key Statistics each df column
print(stock_data.describe())


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 146 entries, 0 to 145
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   GOOG    146 non-null    float64
 1   AAPL    146 non-null    float64
 2   NVDA    146 non-null    float64
dtypes: float64(3)
memory usage: 3.5 KB
None
             GOOG        AAPL        NVDA
count  146.000000  146.000000  146.000000
mean   161.029251  189.870481   92.274136
std     16.806053   18.834939   23.586497
min    132.409317  164.585999   47.562862
25%    145.177303  173.119530   77.871700
50%    157.625633  185.182297   89.173676
75%    176.585003  195.343193  114.674175
max    192.660004  234.548523  135.580002


#### Data Inspection and Cleaning

2. **Checking and Handling Missing Data**
   - `DataFrame.isnull()` - Check for missing values in the DataFrame.
   - `DataFrame.notnull()` - Check for non-missing values in the DataFrame.
   - `DataFrame.dropna()` - Remove missing values from the DataFrame.
   - `DataFrame.fillna()` - Fill missing values in the DataFrame with a specified value or method.

3. **Data Type Conversions**
   - `DataFrame.astype()` - Convert the data types of one or more columns in the DataFrame.
   - `pd.to_datetime()` - Convert argument to datetime format, which is essential for time series analysis in pandas.


In [14]:
### Data Inspection and Cleaning

# Drop null values in the df
stock_data.dropna(inplace = True)  # Use inplace=True to modify the DataFrame directly

# Check if any values in the AAPL column are null
stock_data['AAPL'].isnull().tail()

# Convert the NVDA column to int
stock_data['NVDA'] = stock_data['NVDA'].astype(int)  # Assign back to column to save changes

# Convert Date column to Datetime dtype
#stock_data['Date'] = pd.to_datetime(stock_data['Date'])  # Correct method call, must include pd.

# Check work
stock_data.dtypes


GOOG    float64
AAPL    float64
NVDA      int64
dtype: object