# Pandas Questions

**Question:**  
What are the three main data structures in pandas?

**Answer:**  
- Series (1D)
- DataFrame (2D)
- Panel (3D)

---

**Question:**  
What is the definition of `Series` in pandas?

**Answer:**  
`Series` is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index.

---

**Question:**  
What can be used to create a pandas `Series`?

**Answer:**  
- ndarray
- Python dict
- scalar value

---

**Question:**  
In the following example we want to create a pandas Series from a dictionary. What's the value for index `'b'` in the Series?
```
d = {'a': 10, 'c': 30, 'd': 40}
pd.Series(d, index=['a', 'b', 'c', 'd'])
```
**Answer:**  
The value for index 'b' will be `NaN` which stands for `Not a Number`.


---

**Question:**  
Imagine you want to create a pandas Series which has the constant value 10 in four rows. Is the following code True?
```
pd.Series(10)
```
**Answer:**  
Given code will create just one row with value 10. You have to specify the index for Series to create four rows. Index could be as follows:
```
pd.Series(10, index=range(4))
```


---

**Question:**  
Pandas `Series` is ndarray-like, What does it mean?

**Answer:**  
Series acts very similarly to a ndarray, and is a valid argument to most `NumPy functions`. However, operations such as slicing will also slice the index. Series can also be passed into most `NumPy methods` expecting an ndarray.

---

**Question:**  
Pandas `Series` is dict-like, What does it mean?

**Answer:**  
A Series is like a fixed-size dict in that you can get and set values by index label.


---

**Question:**  
A student tries to change the name of a pandas Series with following code but as you see in the output, the name doesn't change! What is wrong in the code?
```
s = pd.Series([10, 20, 30], name='books')
s.rename('cars')
print(s)
0    10
1    20
2    30
Name: books, dtype: int64
```
**Answer:**  
The method `.rename()` returns a copy of the original Series with a new name. It doesn't rename the original Series in-place. You have two options now:
- store the output of `.rename()` method to a new parameter
- use `s.name = 'cars'` instead of `.rename()` method.


---

**Question:**  
Imagine you create a pandas Series from a numpy array. After that you change the first element of array. Can you guess what's the result of `print(s)`?
```
arr = np.array([10, 20, 30])
s = pd.Series(arr)
arr[0] = 500
print(s)
```
**Answer:**  
The Series s will change because it's a `memory view` of numpy array. If you change the array, the Series will be change.
```
0     500
1     20
2     30
dtype: int64
```

---

**Question:**  
What is the definition of pandas `DataFrame`?

**Answer:**  
DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object.


---

**Question:**  
How can you create a pandas `DataFrame`?

**Answer:**  
- Dictionary of 1D ndarrays, lists, dicts, or Series
- 2-D numpy.ndarray
- Structured or record ndarray
- A Series
- Another DataFrame


---

**Question:**  
A student wants to set the index of DataFrame to **'number1'** and **'number2'**.
As you see all the data is missed. Can you explain what happend and fix the code?
```
data = {
    'name': pd.Series(['Ali', 'Reza']),
    'age': pd.Series([25, 30]),
}
df = pd.DataFrame(data, index = ['number1', 'number2'])
print(df)
        name  age
number1  NaN  NaN
number2  NaN  NaN
```

**Answer:**  
You are creating pandas DataFrame from dictionary of Series. A Series by default has indeces: 0, 1, 2, ...

When you use `index` inside the `pd.DataFrame()` it selects the indices from the Series indices. Because there is no 'number1' and 'number2' indeces, the result is `NaN`. If you want to change the indeces, you can write the following code:
```
df = pd.DataFrame(data)
df.index = ['number1', 'number2']
print(df)
         name  age
number1   Ali   25
number2  Reza   30
```


---

**Question:**  
What is the difference between this question and previous question? Why the result isn't `NaN` in this question? What is changed?

**Answer:**  
```
data = {
    'name': ['Ali', 'Reza'],
    'age': [25, 30],
}
df = pd.DataFrame(data, index = ['number1', 'number2'])
print(df)
         name  age
number1   Ali   25
number2  Reza   30
```
Unlike the previous question, in this question the values of dictionary are `list` and Lists don't have indeces by default. When you use `index` inside the `pd.DataFrame()` there are no indeces to select, so it assigns the 'number1' and 'number2' as indeces.

---

**Question:**  
When you create a pandas DataFrame from a list of dictionaries, the keys of dictionary will be indeces or columns of DataFrame? Give an example.

**Answer:**  
They will be `columns` of DataFrame, for example:
```
data = [
    {'name': 'Ali', 'Age': 25},
    {'name': 'Reza', 'Age': 30},
]
df = pd.DataFrame(data)
print(df)
   name  Age
0   Ali   25
1  Reza   30
```


---

**Question:**  
How can you select a `column` from a pandas DataFrame (df)?

**Answer:**  
You select a column of DataFrame with its column_name. For example:
```
df['name']
```


---

**Question:**  
How can you add a new `column` to a pandas DataFrame (df)?

**Answer:**  
Simply you can use `df[new_column] = new things` or you can use `.insert()` method if you want to add new column in particular location.


---

**Question:**  
How can you delete a `column` from a pandas DataFrame (df)?

**Answer:**  
You can use `del df[column_name]` or `.pop()` method to delete a column. 


---

**Question:**  
What's the difference between `.insert()` and `.assign()` method?

**Answer:**  
`.assign()` method allows you to easily create new columns that are potentially derived from existing columns. It always returns a copy of the data, leaving the original DataFrame untouched but `.insert()` Insert new column into DataFrame at specified location in-place.


---

**Question:**  
What is the difference between `df[]`, `df.loc[]` and `df.iloc[]`?

**Answer:**  
`df[column_name]` returns the specified column with column_name.  
`df.loc[]` and `df.iloc[]` return the spesified row as a Series whose index is the columns of the DataFrame. `df.loc[]` accepts row_name as input but `df.iloc[]` accepts purely integer-location of a row.


---

**Question:**  
What is the meaning of **data alignment** in pandas?

**Answer:**  
Data alignment between DataFrame objects automatically align on both the columns and the index (row labels). The resulting object will have the union of the column and row labels.


---

**Question:**  
A students wants to add the values of columns **'one'** to the all values of DataFrame and tries the following code:
```
df = pd.DataFrame(
    {"one": [10, 20, 30],"two": [100, 200, 300],}
)
df = df + df['one']
print(df)
```
Unfortunately the output is as follows:
```
   one  two   0   1   2
0  NaN  NaN NaN NaN NaN
1  NaN  NaN NaN NaN NaN
2  NaN  NaN NaN NaN NaN
```
Can you help him to solve the problem?

**Answer:**  
By default `+` operator performs on columns (axis=columns). If you want to change the axis,you have to use `.add()` method and set the `axis` equal to `rows` :
```
df = df.add(df['one'], axis='rows')
print(df)
   one  two
0   20  110
1   40  220
2   60  330
```


---

**Question:**  
Can you explain what will happend when you use `fill_value` option in arithmetic functions in pandas?

**Answer:**  
This option there is in functions like: `add()`, `sub()` and so on. Pandas fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.


---

**Question:**  
What is `.fillna()` method use for?

**Answer:**  
The `.fillna()` method replaces the `NULL` values with a specified value.

The `.fillna()` method returns a new DataFrame object unless the inplace parameter is set to `True`, in that case the fillna() method does the replacing in the original DataFrame instead.


---

**Question:**  
What is the difference between `.all()` and `.any()` method in pandas?

**Answer:**  
- `.all()` does a logical **and** operation on a row or column of a DataFrame and returns the resultant Boolean value.
- `.any()` does a logical **or** operation on a row or column of a DataFrame and returns the resultant Boolean value.


---

**Question:**  
Can you explain why the result of the following code is `False`? How can you check the equality of two pandas DataFrames which have `NaN` values?
```
df1 = pd.DataFrame([10, 20, np.nan], index=list('abc'))
df2 = df1
print((df2 == df1).all())
0    False
dtype: bool
```
**Answer:**  
This is because `NaNs` do not compare as equals. You can use `.equals()` method for testing equality, with NaNs in corresponding locations treated as equal.
```
df2.equals(df1)
True
```

---

**Question:**  
What does `df1.combine_first(df2)` really works?

**Answer:**  
Combine two DataFrame objects by filling null values in df1 DataFrame with non-null values from df2 DataFrame. The row and column indexes of the resulting DataFrame will be the union of the two.

---

**Question:**  


**Answer:**  



---

**Question:**  


**Answer:**  



---

**Question:**  


**Answer:**  



---

**Question:**  


**Answer:**  



---

**Question:**  


**Answer:**  



---

**Question:**  


**Answer:**  



---

**Question:**  


**Answer:**  



---