# Research

1. [Pandas Documentation](https://pandas.pydata.org/docs/user_guide/index.html)

## Keywords

### 1. `pd.Series`, `pd.DataFrame`
   - **`pd.DataFrame`**: A two-dimensional array (table).
   - **`Series`**: A one-dimensional array; it is a single column from a DataFrame.

### 2. `df.drop`

### 3. Data Input/Output
   - **`pd.read_csv`, `df.to_csv`**
   - **`pd.read_excel`, `df.to_excel`**
   - **`pd.read_sql`, `df.to_sql`**

### 4. Indexing
   - **`df.loc`**: Label-based indexing. ex: `df.loc[2, 'Age']` or `df.loc[:, 'Age']`
   - **`df.iloc`** : integer position-based indexing. ex `df.iloc[2, 1]` 
   - **`df.at`**: fast label based. like loc but for a single raw
   - **`df.iat`**

### 5. Boolean Indexing

### 6. Basic Information
   - **`df.shape`**
   - **`df.index`**
   - **`df.columns`**
   - **`df.info`**
   - **`df.count`**

### 7. Basic Operations
   - **`df.sum`**
   - **`df.min`, `df.max`**
   - **`df.describe`**

### 8. Inspection
   - **`df.head`, `df.tail`, `df.sample`**

### 9. Merging DataFrames
   - **`df.join`** vs **`df.merge`**

### 10. Grouping Data
   - **`df.groupby`**

### 11. Applying Functions
   - **`df.apply`, `df.transform`**

### 12. Querying Data
   - **`df.query`**

### 13. Modifying Data
   - **`df.replace`, `df.rename`**

### 14. Iterating Over DataFrames
   - **`df.iterrows`**

### 15. Pivot Tables
   - **`pd.pivot_table`**

### 16. Handling Missing Data
   - **`df.isnull`, `df.fillna`, `df.dropna`**

### 17. Concatenating DataFrames
   - **`df.concat`**

### 18. Plotting Data
   - **`df.plot`, `df.hist`, ...**

### 19. MultiIndex

---

# Questions

1. What is the difference between a `np.array` and a `pd.Series`?
   
2. What is the most efficient way to iterate over rows of a pandas DataFrame? How about columns?
 - `df.itertuples()` is the fastest and good for large dataset
 - `df.iterrows()` slower than itertuples but more readable, so good for little dataset
 - vectorized operations: `df['Age'] = df['Age'] + 1` fast and efficient


3. What is the difference between querying and boolean indexing?
4. Is a pandas DataFrame mutable or immutable?
 - mutable

5. Why do you need `loc` and `iloc`? Can't you index directly?
 - indexing a DataFrame directly lead access a column by its label

6. What are all the different ways a missing value can be represented?
 - NaN (default for numeric data).
 - None (Python's NoneType, converted to NaN in numeric columns).
 - NaT (for datetime columns).
 - Empty string ('') (often used in object columns).
 - Custom placeholders (e.g., 'NA', 'missing').
 - pd.NA (for nullable types).
 - np.inf (infinity, not strictly missing but can be used as a stand-in).

# Exercises

### 1. Load the Titanic Dataset
   - Load the Titanic dataset into a pandas DataFrame.
   - Inspect the dataset and gather as much information as you can:
     - What are the columns in the dataset, and how many rows are there?
     - How many missing values are present, and how are they distributed? How will you handle them – filling or dropping?
     - Which variables are relevant, and which are not?
     - What are the variable types? Categorize them (discrete, continuous).
     - Obtain basic statistics.
     - Create basic visualizations.
     - Is there any additional information you can think of?

### 2. More Advanced Statistics
   - Retrieve statistics separated by sex and class.
   - Obtain statistics categorized by age.
   - Create visualizations of the statistics with relevant graphs.
   - Plot the cumulative survival rate as a function of ticket price.
   - Plot the cumulative survival rate as a function of passenger age.
   - Plot the cumulative survival rate as a function of ticket price *and* passenger age (does this make sense?).

### 3. Continue Exploring Your Dataset
   - Examine the number of unique values in each column.
   - Explore the possibility of pivoting the dataset.

### 4. Make Decisions
   - Remove any irrelevant data.
   - Treat missing values.
   - Can you think of any interesting feature engineering?

### 5. Are You Ready to Make Predictions?
   - Try it!

### 6. Obtain the Names and Ticket Numbers of All First-Class Teenagers.

### 7. What Is the Type of Variable `sex`?
   - Turn it into something more useful for future machine learning models.

### 8. Load the Boeing Historical Airplane Orders & Deliveries Dataset
   - How many planes were in the building process for each month of the period covered by the dataset?
