# DataFrame Creation (MCQ)

---

**Question:**

Which of the following data structures can be used to create a Pandas DataFrame? (select all that apply)

1. Dictionary of 1D ndarrays, lists, dicts, or Series
2. 2-D numpy.ndarray
3. Structured or record ndarray
4. A Series
5. Another DataFrame

**Answer:**  All the options are true

All the above options can be used to create a Pandas DataFrame.

---

**Question:**

Which of the following Python dictionaries can be used to create a pandas DataFrame?
```python
d1 = {"one": [1, 2], "two": [1, 2, 3]}
d2 = {"one": np.array([1, 2]), "two": np.array([1, 2, 3])}
d3 = {"one": pd.Series([1, 2]), "two": pd.Series([1, 2, 3])}
```

1. --> d1
2. --> d2
3. --> d3
4. --> All of the above

**Answer:** 3

Both `pd.DataFrame(d1)` and `pd.DataFrame(d2)` will raise a **ValueError: All arrays must be of the same length** and cannot create a pandas DataFrame.

But `pd.DataFrame(d3)` creates a pandas DataFrame with two columns. Since the first series has fewer elements than the other one, pandas automatically fills the missing values with `NaN` (Not a Number) to align the data.

---

**Question:**

When creating a DataFrame from a dictionary of **ndarray/list**, what is the condition for all the ndarray/list to have?

1. They must have the same sum.
2. They must have the same length.
3. They must have the same mean.
4. They must have the same maximum value.

**Answer:** 2

When creating a DataFrame from a dictionary of ndarray/list, all the ndarray/list must have the **same length**. This means that the number of elements in each narray/list should be equal.

---

**Question:**

If no index is passed when creating a DataFrame from a dictionary of ndarray/list, what will be the **default index**?

1. The index will be a sequence of random integers.
2. The index will be a sequence of prime numbers.
3. The index will be a sequence of negative integers.
4. The index will be a sequence of consecutive integers starting from zero.

**Answer:** 4

If no index is passed, the default index will be `range(n),` where `n` is the array length. In other words, the index will be a sequence of consecutive integers starting from zero.

---

**Question:**

When converting a dictionary to a DataFrame, how can we select only specific columns to be included in the resulting DataFrame?

1. By specifying the row labels in the constructor.
2. By passing a list of column labels to the "columns" parameter in the constructor.
3. By providing column names as keyword arguments in the constructor.
4. By using the "subset" parameter in the constructor.

**Answer:** 2

When converting a dictionary to a DataFrame, we can select only specific columns to be included in the resulting DataFrame by passing a list of column labels to the `"columns"` parameter in the constructor. By doing so, we can filter and include only the required columns from the original dictionary in the DataFrame.

---

**Question:**

How can we provide a **customized index** for each row when creating a pandas DataFrame from a dictionary of ndarray/list?

1. By using the "custom_index" parameter in the constructor.
2. By passing a list of row indexes to the "indexes" parameter in the constructor.
3. By specifying the row index labels as a separate dictionary in the constructor.
4. By passing a list of row indexes to the "index" parameter in the constructor.

**Answer:** 4

When creating a pandas DataFrame from a dictionary of ndarray/list, if we want to assign specific index labels to each row, we can pass a list of row indexes to the `"index"` parameter.

---

**Question:**

What will be the result of the following code?

```python
d = {"one": pd.Series([1, 2, 3]), "two": pd.Series([4, 5, 6])}
df = pd.DataFrame(d, index=list("abc"))
print(df)
```

1.
```python
   one  two
a  1    4
b  2    5
c  3    6
```

1.
```python
   one  two
0  1    4
1  2    5
2  3    6
```

1.
```python
   one  two
a  NaN  NaN
b  NaN  NaN
c  NaN  NaN
```

1.
```python
   one  two
0  NaN  NaN
1  NaN  NaN
2  NaN  NaN
```

**Answer:** 3

Both `pd.Series([1, 2, 3])` and `pd.Series([4, 5, 6])` create a Series with default numerical indices `[0, 1, 2]`. However, when creating the DataFrame, the index labels are explicitly set as `["a", "b", "c"]`. Since there is no correspondence between the numerical indices of the Series and the specified index labels, the resulting DataFrame will have missing values, represented as `NaN`.

---

**Question:**

When creating a DataFrame from a **list of dictionaries**, what happens to the keys of the different dictionary objects by default?

1. The keys are converted into separate columns of the DataFrame.
2. The keys are merged and consolidated into a single column.
3. The keys are discarded and not included in the resulting DataFrame.
4. The code raises an error.

**Answer:** 1

When creating a DataFrame from a **list of dictionaries**, each dictionary in the list represents a row in the DataFrame. By default, the keys of the dictionaries are used as column names, and their corresponding values become the values in the respective columns.

---

**Question:**

When creating a DataFrame from a **list of dictionaries**, what happens if a dictionary is missing a key compared to the other dictionaries in the list?

1. The missing key is filled with the value 0.
2. The missing key is ignored and not included in the resulting DataFrame.
3. The missing key is replaced with NaN in the corresponding column of the DataFrame.
4. The code raises an error.

**Answer:** 3

When creating a DataFrame from a **list of dictionaries**, if a dictionary is missing a key compared to the other dictionaries in the list, the missing key is replaced with `NaN` (Not a Number) in the corresponding column of the DataFrame.

---

**Question:**

Which method can be used to access or modify the index of an existing DataFrame in pandas?

1. Utilizing the `df.loc[]` accessor.
2. Applying the `df.set()` method.
3. Using the `df.indexes` notation.
4. Using the `df.index` notation.

**Answer:** 4

Using the `df.index` notation allows you to directly access or modify the index of an existing DataFrame. For example, you can use `df.index` to retrieve the current index or assign a new index to the DataFrame.

---