# Delimiters, headers, and extensions

#### EXERCISE:
Not all data files are clean and tidy. Pandas provides methods for reading those not-so-perfect data files that you encounter far too often.

In this exercise, you have monthly stock data for four companies downloaded from
<a href="http://finance.yahoo.com" target="_blank">Yahoo Finance</a>.  The data is stored as one row for each company and each column is the
end-of-month closing price. The file name is given to you in the variable <code>file_messy</code>.

In addition, this file has three aspects that may cause trouble for lesser tools: multiple header lines, comment records (rows) interleaved throughout the data rows, and space delimiters instead of commas.

Your job is to use pandas to read the data from this problematic <code>file_messy</code> using non-default input options with <code>read_csv()</code> so as to tidy up the mess at read time. Then, write the cleaned up data to a CSV file with the variable <code>file_clean</code> that has been prepared for you, as you might do in a real data workflow.

You can learn about the option input parameters needed by using <code>help()</code> on the pandas function <code>pd.read_csv()</code>.

#### INSTRUCTIONS:
* Use <code>pd.read_csv()</code> <em>without using any keyword arguments</em> to read <code>file_messy</code> into a pandas DataFrame <code>df1</code>.
* Use <code>.head()</code> to print the first 5 rows of <code>df1</code> and see how messy it is. Do this in the IPython Shell first so you can see how modifying <code>read_csv()</code> can clean up this mess.
* Using the keyword arguments <code>delimiter=' '</code>, <code>header=3</code> and <code>comment='#'</code>, use <code>pd.read_csv()</code> again to read <code>file_messy</code> into a new DataFrame <code>df2</code>.
* Print the output of <code>df2.head()</code> to verify the file was read correctly.
* Use the DataFrame method <code>.to_csv()</code> to save the DataFrame <code>df2</code> to the variable <code>file_clean</code>. Be sure to specify <code>index=False</code>.
* Use the DataFrame method <code>.to_excel()</code> to save the DataFrame <code>df2</code> to the file <code>'file_clean.xlsx'</code>. Again, remember to specify <code>index=False</code>.

#### SCRIPT.PY:

In [None]:
# Read the raw file as-is: df1
df1 = pd.read_csv(____)

# Print the output of df1.head()
print(df1.head())

# Read in the file with the correct parameters: df2
df2 = pd.read_csv(____, delimiter=____, header=____, comment=____)

# Print the output of df2.head()
print(df2.head())

# Save the cleaned up DataFrame to a CSV file without the index
df2.____(file_clean, index=False)

# Save the cleaned up DataFrame to an excel file without the index
df2.____('file_clean.xlsx', index=False)