# **Exploring Pandas: Common Data Operations**

Welcome to this Jupyter Notebook! üöÄ In this notebook, you'll practice some of the most commonly used operations in the **Pandas** library using **two datasets**:
1. **../../data/students.csv** (CSV)
2. **../../data/enrollments.json** (JSON)

These files should be placed in the same folder as this notebook. By the end, you'll have a strong grasp of common data manipulation tasks, and you'll even merge these two datasets on a common key.

Before starting, make sure you have **Pandas** installed. (It should come preinstalled in Anaconda!)

If pandas is not installed, follow the instructions below.

---

## **Checking if Pandas is Installed in Your Conda Environment**

Before proceeding, check if Pandas is installed in your Conda environment by running the following command in a **Jupyter Notebook** cell:

In [1]:
import pandas as pd
print(pd.__version__)

2.2.2


If this runs without errors and prints a version number, Pandas is installed. If you see an **ImportError**, install Pandas using one of the following methods:

### **For Conda Users (Recommended)**
Run this in your terminal or Anaconda Prompt:
```
conda install pandas
```

### **Using Conda-Forge (If Needed)**
If you encounter issues, you can install Pandas from **Conda-Forge**, a community-maintained repository with up-to-date packages:
```
conda install -c conda-forge pandas
```

### **For Pip Users**
If you're using a virtual environment outside Conda, install Pandas via Pip:
```
pip install pandas
```

# Now, let‚Äôs dive in! üèä‚Äç‚ôÇÔ∏è

---


## **1. Load a CSV file into a Pandas DataFrame**

First, let's **import Pandas** and load the datasets. Two datasets have been prepared for you:

- `students.csv`
- `enrollments.json`

You will use these two datasets for the following challenges.

**üí° Hint:** If the file is in the same directory as your notebook, you can just use the filename. Otherwise, provide the full file path.


In [2]:
# your code here

In [3]:
# your code here

## **2. View the First and Last Few Rows of Each DataFrame**

Check out how your data looks. One method previews the first few records, while another method previews the last few. You can specify the number of rows you want to see by explicitly passing an integer argument.

**üìù Tip:** This is a great time to confirm that columns loaded correctly and to spot any obvious data issues (strange values, mismatched columns, etc.).

In [4]:
# your code here

In [5]:
# your code here

In [6]:
# your code here

In [7]:
# your code here

## **3. Check the Shape of Each DataFrame**

To understand the **size** of your dataset(s), use the attribute that returns `(number_of_rows, number_of_columns)`.

**üìù Tip:** Note any big differences in row counts that might affect merging later.


In [8]:
# your code here


In [9]:
# your code here

## **4. Get a Summary of Each DataFrame**

Explore one or two approaches that provide:

-   **Column names**
-   **Data types**
-   **Basic statistics about numerical columns**
-   **Number of non-null values**

**üìù Tip:** One approach might give an overview of columns and data types; another might summarize numerical columns. This step helps you detect columns that might need cleaning.

In [10]:
# your code here

In [11]:
# your code here

In [12]:
# your code here

In [13]:
# your code here

## **5. Check for Missing Values**

Determine if your dataset has any missing or null values by **counting** them. Notice which columns have many missing entries and plan how to handle them.

**üìù Tip:** Some columns might look present but contain empty strings. Identify them if possible.


In [14]:
# your code here

In [15]:
# your code here

In [16]:
# your code here

## **6. Rename Columns for Clarity and Consistency**

Some columns may have **spaces** or **capitalization** that complicates your analysis. For example, if you see `"current gpa"` or `"First_Name"`, consider renaming them (e.g., `"current_gpa"`, `"first_name"`) for ease of use.

**üìù Tip:** Consistent naming conventions help minimize typos and KeyErrors.

In [17]:
# your code here

In [18]:
# your code here

In [19]:
# your code here

In [20]:
# your code here

## **7. Convert Data Types Where Needed**

Check which columns should be numeric or datetime. Columns like `admission_year` or `course_fee` might be read as **strings** by default. Convert them to numerical or date formats if necessary.

**üìù Tip:** Make sure you handle errors gracefully (e.g., set `errors='coerce'` to turn invalid entries into NaN).

In [21]:
# your code here

In [22]:
# your code here

In [23]:
# your code here

## **8. Fill Missing Values with a Specified Value or Method**

Instead of **dropping** missing values, consider **replacing** them. For instance:

-   A string like `"Unknown"` for missing text
-   A **mean** or **median** for missing numeric columns
-   A **forward** or **backward fill** if appropriate

In [24]:
# your code here

In [25]:
# your code here

In [26]:
# your code here

In [27]:
# your code here

## **9. Drop Rows or Columns with Missing Values (If Needed)**

After considering which values can be filled, you might choose to **remove** rows or columns that are missing too much data or can‚Äôt be fixed.

**üìù Tip:** Decide carefully and confirm you don‚Äôt need the dropped information. Use `inplace=True` or keep a separate DataFrame if you want to preserve the original data.

In [28]:
# your code here

In [29]:
# your code here

In [30]:
# your code here

In [31]:
# your code here

## **10. Filter Rows Based on a Condition**

Now that columns like `admission_year` and `course_fee` (or `current_gpa`) are numeric, experiment with filtering. For example:

-   Students whose `admission_year` is after a certain date
-   Enrollments for `Spring 2026`

In [32]:
# your code here

In [33]:
# your code here

In [34]:
# your code here

In [35]:
# your code here

## **11. Select Specific Columns from Each DataFrame**

Often, you don‚Äôt need all columns at once. For instance, you might extract only:

-   `"student_id"`, `"First_Name"`, `"last_name"`, and `"current_gpa"` from `students.csv`
-   `"stud_ref_id"`, `"course_title"`, `"instructor_name"`, `"course_fee"` from `enrollments.json`

In [36]:
# your code here

In [37]:
# your code here

In [38]:
# your code here

In [39]:
# your code here

## **12. Sort the DataFrame by One or More Columns**

Sorting can help you identify which records have the highest or lowest values. For example:

-   Sort the **students** DataFrame by `"current_gpa"` in descending order
-   Sort the **enrollments** DataFrame by `"course_fee"` in ascending order

**üìù Tip:** You can sort by multiple columns if needed.


In [40]:
# your code here

In [41]:
# your code here

In [42]:
# your code here

In [43]:
# your code here

## **13. Group Data by a Column and Compute Aggregate Functions**

Grouping lets you see aggregated info by category. For example, group **students** by `"majorField"` and compute the average `"current_gpa"`. In **enrollments**, group by `"instructor_name"` and compute the average `"course_fee"`.

**üìù Tip:** Aggregations might include `.mean()`, `.sum()`, `.count()`, etc.

In [44]:
# your code here

In [45]:
# your code here

In [46]:
# your code here

In [47]:
# your code here

## **14. Apply a Custom Function**

Define a normal Python function to transform data in a column. For example, title-case a name or uppercase a field. Apply that function to each element in the column.

**üìù Tip:** If your function references another library call or has complex logic, define it above and then use `.apply(...)` with your function name. Once you've done this, see if you do this using lamda notation. 

In [48]:
# your code here

In [49]:
# your code here

In [50]:
# your code here

In [51]:
# your code here

## **15. Create a New Column Based on Existing Ones**

Use existing columns to generate new ones. For instance, combine `"First_Name"` and `"last_name"` into `"full_name"`, or compute `"fees_after_tax"` in enrollments if you assume a tax rate.

In [52]:
# your code here

In [53]:
# your code here

In [54]:
# your code here

In [55]:
# your code here

## **16. Merge Two DataFrames on a Common Column**

Combine `students.csv` and `enrollments.json` by matching:

-   `stu["student_id"]`
-   `enr["stud_ref_id"]` (or rename it first)

Check the shape of the merged DataFrame afterward to ensure it merged as expected.


In [56]:
# your code here

In [57]:
# your code here

In [58]:
# your code here

In [59]:
# your code here

## **17. Remove Duplicate Rows**

When merging or concatenating multiple files, duplicates can crop up. Identify them and remove if needed. This might be especially important if the same student or enrollment is listed more than once.

In [60]:
# your code here

In [61]:
# your code here

In [62]:
# your code here

In [63]:
# your code here

## **18. Additional Data Cleaning**

Now that you‚Äôve merged or manipulated your data, do a quick final pass:

-   Fix any remaining oddities (e.g., negative phone numbers or impossible dates)
-   Normalize columns further (e.g., standardize text formatting)

**üìù Tip:** You might revisit previous steps if new issues appear.


In [64]:
# your code here

In [65]:
# your code here

In [66]:
# your code here

In [67]:
# your code here

## **19. Save the Cleaned and Merged DataFrame to a New CSV File**

Finally, when you‚Äôre satisfied with your cleaned data, save it. Remember to avoid writing the index as a separate column unless you want it.

In [68]:
# your code here

In [69]:
# your code here

In [70]:
# your code here

In [71]:
# your code here

## **20. Explore Further Analyses (Optional)**

Now that your data is in great shape, try some optional challenges:

-   Generate charts or visualizations
-   Perform advanced filtering or grouping
-   Create pivot tables
-   Or anything else that interests you!

In [72]:
# your code here

In [73]:
# your code here

In [74]:
# your code here

In [75]:
# your code here

**üéâ Congratulations!** You‚Äôve now tackled **data cleaning** and many essential **Pandas** operations in `students.csv` and `enrollments.json`. Keep experimenting to sharpen your **data manipulation skills** and unlock deeper insights! üí™