**Exercise 2.1: Reading Data From Plain Text Files**

Which line of code will correctly parse the data stored in the file `hawaii_toursim.txt` which is stored in the current working directory, if the data is formatted in the following way:

```
Year : Visitor_Arrival_Total : Visitor_Arrival_International
1966 : 834732                : 205168
1967 : 1124012               : 295163
1968 : 1313706               : 360885
```

so that the `DataFrame`, `Hawaii_tourism_df` is structured as is shown in the table below?

| | Year |   Visitor Arrival Total    | Visitor Arrival International |
|:--------------------------|:-------------------|:-------------------|:-----------------------|
|   0  |   1966      |  834732 |    205168|
|   1  |   1967      |  1124012 |  295163|
|   2  |   1968      |  1313706 |  360885|

A:
```python
Hawaii_tourism_df = pd.read_table('hawaii_toursim.txt')
```

B:
```python
Hawaii_tourism_df = pd.read_table('hawaii_toursim.txt', sep=':')
```

C:
```python
Hawaii_tourism_df = pd.read_csv('hawaii_toursim.txt')
```

D:
```python
Hawaii_tourism_df = pd.read_csv('hawaii_toursim.txt', sep=:)
```


**Correct Answer**

B:
```python
Hawaii_tourism_df = pd.read_table('hawaii_toursim.txt', sep=':')
```

**Explanation**

A: By default `read_table()` will seperate columns by tabs, '\t', but in this case columns are seperated by colons, ':'.

B: To seperate columns by colons, ':', set the option `sep` parameter of the `read_table()` function to ':'. The default row seperation is good for this exercise.

C: The default behavior of `read_csv()` is to separate columns using commas, ‘,’, and rows by newlines, '\n'.

D: To set the `sep` parameter of the `read_csv()` or the `read_table()` function to the colon character, ':', we must wrap the colon character in double or single quotes so that `Python` knows you are refering to the literal character and not a variable or something else. The line of code in this option results in the following error: `SyntaxError: invalid syntax`.

**Exercise 2.2: Indexing with `pandas` Parsing Functions**

Which line of code will correctly parse the data stored in the file `hawaii_toursim.txt` which is stored in the current working directory, if the data is formatted in the following way:

```
1966, 834732, 205168
1967, 1124012, 295163
1968, 1313706, 360885
```

so that the `DataFrame`, `Hawaii_tourism_df` is structured as is shown in the table below?

| | 0 |   1    |
|:--------------------------|:-------------------|:-------------------|
|  1966      |  834732 |    205168|
|  1967      |  1124012 |  295163|
|  1968      |  1313706 |  360885|

A:
```python
Hawaii_tourism_df = pd.read_table('hawaii_toursim.txt')
```

B:
```python
Hawaii_tourism_df = pd.read_table('hawaii_toursim.txt', sep=',', header=None)
```

C:
```python
Hawaii_tourism_df = pd.read_csv('hawaii_toursim.txt', header=None, index_col=0)
```

D:
```python
Hawaii_tourism_df = pd.read_csv('hawaii_toursim.txt', index_col=0)
```

**Correct Answer**

C:
```python
Hawaii_tourism_df = pd.read_csv('hawaii_toursim.txt', header=None, index_col=0)
```

**Explanation**

A: By default, `read_table()` will assume that the file being read has a header and that it is the first nonempty row, but in this problem the file does not have a header. Furthermore, `read_table()`seperates columns by tabs, '\t', if the `sep` parameter is not set to something else in the function call.

B: In this option the `sep` parameter is set to commas, ',', and the header parameter is set to `None`, as it should, however, notice that the $0^{th}$ column of the `DataFrame` should be used as the index. Without setting the `index_col` parameter of the `read_table()` and `read_csv()` function, the resulting `DataFrame` will have the default range of integers as its index.

C: By default, `read_csv()` will seperate columns by commas, ','. Then since there is no header in the file we are reading, the `header` parameter is set to `None`, and since the $0^{th}$ column is to be the index of the resulting `DataFrame`, the `index_col` parameter is set to 0.

D: Without specifying that there is no header in the file being read, `read_csv()` will assume that the first nonempty row is the header of the data set.

**Exercise 2.3: Utilizing the Functionality of `pandas` Parsing Functions**

Which line of code will correctly parse the data stored in the file `hawaii_toursim.txt` which is stored in the current working directory, if the data is formatted in the following way:

```
# This dataset describes Hawaii's toursim industry
Year, Visitor_Arrival_Total, Visitor_Arrival_International
1966, 834732, 205168
1967, 1124012,?
1968, 1313706, 360885
```

so that the `DataFrame`, `Hawaii_tourism_df` is structured as is shown in the table below?

| | 0 |   1    |
|:--------------------------|:-------------------|:-------------------|
|  1966      |  834732 |    205168|
|  1967      |  1124012 |  NaN|
|  1968      |  1313706 |  360885|

A:
```python
Hawaii_tourism_df = pd.read_csv('hawaii_toursim.txt', index_col=0, header=None, na_values='?', sep=',')
```

B:
```python
Hawaii_tourism_df = pd.read_table('hawaii_toursim.txt', index_col='Year', skiprows=[0], na_values='?')
```

C:
```python
Hawaii_tourism_df = pd.read_csv('hawaii_toursim.txt', na_values='?')
```

D:
```python
Hawaii_tourism_df = pd.read_csv('hawaii_toursim.txt',  index_col='Year', skiprows=[0], na_values='?')
```

**Correct Answer**

D:
```python
Hawaii_tourism_df = pd.read_csv('hawaii_toursim.txt',  index_col='Year', skiprows=[0], na_values='?')
```

**Explanation**

A: Notice that the $0^{th}$ row of the file being read is a comment describing the data set. This row should be skipped when building the `DataFrame` which can be done with the `skiprows` parameter of the `read_table()` function. Also, the dataset does have a header defining the column labels, so the `header` parameter should not be set to `None`.

B: The default behavior of `read_table()` is to seperate columns by tabs, '\t', but in this exercise the columns are seperated by commas, ',', so the `sep` parameter should be set to ',', in order to make this option correct.

C: Similar to option A,  the $0^{th}$ row of the file being read is a comment describing the data set. This row should be skipped when building the `DataFrame` which can be done with the `skiprows` parameter of the `read_csv()` function. Also, the $0^{th}$ column of the dataset, that is the column labeled 'Year', should be used as the index column.

D: This option correctly skips the comment, identifies the missing value labeled by the character '?', sets the index, and seperates the columns of the dataset.