### Data Manipulation by Pandas and Numpy


In [1]:
import pandas as pd

In [3]:
df = pd.read_csv('month.csv')
df.head(5)

Unnamed: 0,Name,Abbreviation,Numeric,Numeric-2
0,January,Jan.,1,1
1,Feburary,Feb.,2,2
2,March,Mar.,3,3
3,April,Apr.,4,4
4,May,May,5,5


In [4]:
df.tail(5)

Unnamed: 0,Name,Abbreviation,Numeric,Numeric-2
7,August,Aug.,8,8
8,September,Sept.,9,9
9,October,Oct.,10,10
10,November,Nov.,11,11
11,December,Dec.,12,12


In [5]:
df.isnull()

Unnamed: 0,Name,Abbreviation,Numeric,Numeric-2
0,False,False,False,False
1,False,False,False,False
2,False,False,False,False
3,False,False,False,False
4,False,False,False,False
5,False,False,False,False
6,False,False,False,False
7,False,False,False,False
8,False,False,False,False
9,False,False,False,False


In [7]:
df.isnull().any(axis=1)

0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
dtype: bool

In [8]:
df.fillna(0)

Unnamed: 0,Name,Abbreviation,Numeric,Numeric-2
0,January,Jan.,1,1
1,Feburary,Feb.,2,2
2,March,Mar.,3,3
3,April,Apr.,4,4
4,May,May,5,5
5,June,June,6,6
6,July,July,7,7
7,August,Aug.,8,8
8,September,Sept.,9,9
9,October,Oct.,10,10


In [11]:
df=df.rename(columns={'Name':'Month'})
df.head()

Unnamed: 0,Month,Abbreviation,Numeric,Numeric-2
0,January,Jan.,1,1
1,Feburary,Feb.,2,2
2,March,Mar.,3,3
3,April,Apr.,4,4
4,May,May,5,5


df.['Value_New']=df.['Value'].fillna(df['Value'].mean()).astype(int)

df['Value New']=df['Value'].apply(lambda x:x*2)

The **`lambda`** part in your code is a way to create a small, anonymous (unnamed) function in Python. Let's break it down:

### **What is `lambda`?**

- **`lambda`** allows you to create a function **without naming it**.
- It's often used for short, simple operations where defining a full function with `def` would be unnecessary.

### **Syntax of `lambda`:**
```
lambda arguments: expression
```

- **`arguments`**: The input(s) to the function (in your case, `x`).
- **`expression`**: The operation you want to perform, which is returned automatically (no need for `return`).

### **In Your Code:**
```python
lambda x: x * 2
```

- **`x`**: This represents each individual value from the `'Value'` column in your DataFrame.
- **`x * 2`**: This doubles the value of `x`.

### **How It Works with `apply()`:**

- The **`apply()`** function takes this lambda function and applies it to **each element** in the `'Value'` column.

So, for example:
- If **`x = 10`**, then **`lambda x: x * 2`** becomes **`10 * 2 = 20`**.
- If **`x = 20`**, it becomes **`20 * 2 = 40`**.

### **Equivalent Using `def`:**

The lambda function:
```python
lambda x: x * 2
```

Is the same as writing:
```python
def double(x):
    return x * 2
```

And applying it like:
```python
df['Value New'] = df['Value'].apply(double)
```

But **`lambda`** is quicker and cleaner for simple tasks like this.

---

Let me know if you'd like more examples or deeper explanations! 😊

In [12]:
df1 = pd.DataFrame({'Key': ["a","b","c"] , "Value" : [1,2,3]})
df2 = pd.DataFrame({'Key': ["a","b","d"] , "Value" : [2,4,6]})

In [13]:
df1

Unnamed: 0,Key,Value
0,a,1
1,b,2
2,c,3


In [14]:
df2

Unnamed: 0,Key,Value
0,a,2
1,b,4
2,d,6


In [17]:
pd.merge(df1,df2,on='Key',how= 'inner')

Unnamed: 0,Key,Value_x,Value_y
0,a,1,2
1,b,2,4


In [18]:
pd.merge(df1,df2,on='Key',how= 'right')

Unnamed: 0,Key,Value_x,Value_y
0,a,1.0,2
1,b,2.0,4
2,d,,6


The `pd.merge()` function in **pandas** is used to combine two DataFrames based on a common column or index. Let's break down the specific code:

```python
pd.merge(df1, df2, on='Key', how='right')
```

### Components Explained:

1. **`df1` and `df2`**:  
   These are the two DataFrames you want to merge.

2. **`on='Key'`**:  
   This specifies the column name (`'Key'`) that both DataFrames share, which will be used to match rows. Both DataFrames should have this column.

3. **`how='right'`**:  
   This defines the type of merge:
   - A **right join** keeps **all rows from `df2`** (the right DataFrame) and only the matching rows from `df1` (the left DataFrame).
   - If a value in the `'Key'` column exists in `df2` but **not** in `df1`, the result will still include that row, but the columns from `df1` will have **NaN** (missing values).

---

### Example:

**DataFrame 1 (`df1`):**

| Key | A   |
|-----|-----|
| 1   | X   |
| 2   | Y   |
| 3   | Z   |

**DataFrame 2 (`df2`):**

| Key | B   |
|-----|-----|
| 2   | P   |
| 3   | Q   |
| 4   | R   |

**Code:**

```python
result = pd.merge(df1, df2, on='Key', how='right')
```

**Result:**

| Key | A     | B   |
|-----|-------|-----|
| 2   | Y     | P   |
| 3   | Z     | Q   |
| 4   | NaN   | R   |

---

### Explanation of the Result:
1. **Keys 2 and 3**:  
   Both DataFrames have these keys, so the corresponding values from both `df1` and `df2` are merged.

2. **Key 4**:  
   Exists **only in `df2`**, so it's included in the result. Since `df1` has no matching value, column **A** shows **NaN**.

3. **Key 1**:  
   Exists **only in `df1`**, so it is **not included** in the result because we did a **right join** (which prioritizes `df2`).

---

Let me know if you need more examples or further explanation!