> - Import and Load Data: Get your dataset into a DataFrame

> - Initial Exploration: Understand the basic structure and contents of the data.

> - Data Cleaning: Address missing values, duplicates, and incorrect data types.

> - Data Transformation: Modify and create new features.

> - Exploratory Data Analysis (EDA): Analyze and visualize data for insights.

> - Feature Engineering: Prepare features for analysis or modeling.

> - Data Indexing and Selection: Organize and access your data efficiently.

> - Final Checks and Save Data: Ensure data integrity and save the cleaned dataset.

> - These steps should guide you through working with datasets in pandas effectively

 1. **Import pandas and Load Data**

   - **Import pandas**: Import the pandas library.
   - **Load Dataset**: Use `pd.read_csv()`, `pd.read_excel()`, or other pandas functions to load your dataset into a DataFrame.

   ```python
   import pandas as pd

   # Load dataset
   df = pd.read_csv('data.csv')
   ```

### 2. **Initial Exploration**

   - **View Basic Data**:
     - `df.head()`: View the first few rows.
     - `df.tail()`: View the last few rows.
     - `df.info()`: Check data types and non-null counts.
     - `df.describe()`: Get summary statistics for numerical columns.

   ```python
   print(df.head())
   print(df.info())
   print(df.describe())
   ```

 3. **Data Cleaning**

   - **Handle Missing Values**:
     - `df.isnull().sum()`: Identify missing values.
     - `df.dropna()`: Remove rows with missing values.
     - `df.fillna(value)`: Fill missing values with a specified value.

   - **Remove Duplicates**:
     - `df.drop_duplicates()`: Remove duplicate rows.

   - **Correct Data Types**:
     - Convert columns to appropriate data types using `df.astype()`.

   ```python
   df = df.dropna()
   df = df.fillna(method='ffill')
   ```

 4. **Data Transformation**

   - **Rename Columns**:
     - `df.rename(columns={'old_name': 'new_name'})`: Rename columns if necessary.

   - **Create New Features**:
     - Use pandas operations to create new columns based on existing ones.

   - **Filter Data**:
     - Use conditional statements to filter rows based on criteria.

   ```python
   df['new_column'] = df['existing_column'] * 10
   filtered_df = df[df['column'] > 100]
   ```

5. **Exploratory Data Analysis (EDA)**

   - **Analyze Column Values**:
     - Use methods like `df['column'].value_counts()` to explore unique values and their frequencies.

   - **Group Data**:
     - `df.groupby('column').mean()`: Compute aggregate statistics for grouped data.

   ```python
   print(df['column'].value_counts())
   print(df.groupby('column').mean())
   ```

 6. **Feature Engineering**

   - **Encode Categorical Variables**:
     - Convert categorical variables to numeric using methods like `pd.get_dummies()`.

   ```python
   df_encoded = pd.get_dummies(df, columns=['categorical_column'])
   ```

### 7. **Data Indexing and Selection**

   - **Set Index**:
     - `df.set_index('column')`: Set a column as the DataFrame index if needed.

   - **Select Data**:
     - Use `.loc[]` for label-based indexing and `.iloc[]` for integer-based indexing.

   ```python
   df = df.set_index('user_id')
   row = df.loc[1]  # Label-based selection
   ```

 8. **Final Checks and Save Data**

   - **Verify Changes**:
     - Ensure all transformations and cleaning steps are correct.

   - **Save Data**:
     - Save the DataFrame to a new file if needed using `df.to_csv()` or other appropriate methods.

   ```python
   df.to_csv('cleaned_data.csv', index=False)
   ```


Sequence: The sequence of these steps provides a logical flow from initial data loading to final saving. 

Each step builds upon the previous ones, ensuring that you are working with clean and well-understood data.

Flexibility: While this sequence is typical, the exact order might vary depending on the specific requirements of your analysis. 

For instance, you might do some EDA before cleaning if you want to understand the extent of missing values or outliers.

Types of Data:
There are two main types of data in a dataset:

 1:**Numerical Data**
  **Numerical Data**
  >Numerical data consists of numbers and can be used for mathematical operations. It can be further classified into:

  **Discrete Data:** 
  >These are whole numbers that represent distinct values. For example, if Merry buys 2, 3, or 4 paint brushes, this is discrete data because you cannot have a fraction of a paint brush.

  **Continuous Data:**
  > This data can take any value within a range, including decimals. For example, the cost of art supplies could be $10, $10.21, or any other value in between.
  
 2:**Categorical Data**
  >Categorical data represents categories or groups and can be further classified into:

   **Nominal Data:** 
  >This type of data represents categories with no inherent order. Examples include types of art supplies (e.g., paint brushes, canvases) or customer names (e.g., Alice, Merry). The categories are distinct but not ordered.

  **Ordinal Data:**
  > This type of data represents categories with a meaningful order but no consistent difference between categories. For example, shirt sizes (Small, Medium, Large) are ordinal because they have a natural order. The sizes are not numerical but have a ranking.



In [12]:
import pandas as pd  


In [13]:

# Corrected data dictionary
data = {
    'Age': [99],  # List or array required for DataFrame creation
    'Name': ["Alina"]  # List or array required for DataFrame creation
}

# Create DataFrame
s = pd.DataFrame(data)

# Print DataFrame
print(s)


   Age   Name
0   99  Alina
