# Programatically Bringing Data into Excel from Discrete Sources

* The purpose of this notebook is to show how Python can programmatically bring data into Excel from a discrete source.

## Importing a Single Spreadsheet Individual Steps

In [None]:
from pandas import DataFrame, Series
from datetime import datetime
import pandas as pd

### Template to Read a CSV File into `DataFrame`

     dataframe = pd.read_csv(path_to_csv_file, options)

### Option Decisions
- A comma-separated value (CSV) file is not a standard format. There are a lot of options which need to be considered.
- **Column name:** Some row of the file contains names for the columns
  - If you want to use this first row as the column names, add `header=0`
  - If you do not want to use this first row, add `skiprows=1`
  - If there are not colummn names, there are two choices:
      - Add option `header = None` and all rows will be treated as data
      - Add option `names = [ 'List', 'of', 'column', 'names']`
- **Separator** 
  - If different separator, add `sep="character used for separator"`
- **Row index:** Other than default
  - Not covered in this course
- **Not all columns wanted**
  - Add `usecols = ['list', 'of', 'column', 'numbers', 'or', 'names']`

#### Example: All Columns, Column Names Defined in Row 1

In [None]:
df1 = pd.read_csv('Source_files/google.csv', header=0)

In [None]:
df1.head()

#### Example: All Columns, Skip Headers - Just Data

In [None]:
df2 = pd.read_csv('Source_files/google.csv', header = None, skiprows = 1)

In [None]:
df2.head()

#### Example: All Columns, New Column Names

In [None]:
df3 = pd.read_csv('Source_files/google.csv', header = None, names = ['d', 'o', 'h', 'l', 'c', 'v'], skiprows = 1)

In [None]:
df3.head()

#### Example: Selected Columns

- **NOTE:** You can use integer for the columns with the first column being 0.

In [None]:
df4 = pd.read_csv('Source_files/google.csv', header = 0, usecols = ['date', 'open', 'high', 'low'])

In [None]:
df4.head()

## Writing pandas DataFrame to xlsx (Excel) Format 

In [None]:
# Create an Excel writer to output file
output_file_path = 'Destination_files/google_all_no_labels.xlsx'
writer = pd.ExcelWriter(output_file_path)

# Write file out
# Sheet1 is the name of the sheet on the Excel book
# Notice that the pandas DataFrame is used to specify which DataFrame to write out
df2.to_excel(writer, sheet_name = 'google_no_labels', index = False, header = False, )

# Close writer to make sure all data written to xlsx file
writer.save() 

### Checking the Sheet

1. Open the xlsx file `google_all_no_labels.xlsx` in Excel.

In [None]:
# Create an Excel writer to output file
output_file_path = 'Destination_files/google_all_labels.xlsx'
writer = pd.ExcelWriter(output_file_path)

# Write file out
# Sheet1 is the name of the sheet on the Excel book
# Notice that the pandas DataFrame is used to specify which DataFrame to write out
df2.to_excel(writer, sheet_name = 'google_labels', index = False, header = True, )

# Close writer to make sure all data written to xlsx file
writer.save() 

### Checking the Sheet

1. Open the xlsx file `google_all_labels.xlsx` in Excel.

## A Little Simpler with a Function

### Creating Functions 

In [None]:
def create_dataframe_file_from_csv(csv_file_path, **options):
    import pandas as pd
    
    options_passed = dict(options)
    
    df = pd.read_csv(csv_file_path, **options_passed)
    return df
        
def create_xlsx_file_from_df(dataframe, xlsx_file_path, sheet_name = None):
    import pandas as pd
    
    writer = pd.ExcelWriter(xlsx_file_path)
    
    if not sheet_name:
        dataframe.to_excel(writer)
    else:
        dataframe.to_excel(writer, sheet_name = sheet_name)
    writer.save()
    return

def create_xlsx_file_from_csv(csv_file_path, xlsx_file_path, **options):
    import pandas as pd
    
    options_passed = dict(options)
    sheet_name = options.get('sheet_name')
    if sheet_name:
        options_passed.pop('sheet_name')
    
    df = create_dataframe_file_from_csv(csv_file_path, **options_passed)
    
    if not sheet_name:
        create_xlsx_file_from_df(df, xlsx_file_path, sheet_name = sheet_name)
    else:
        create_xlsx_file_from_df(df, xlsx_file_path)
    return
    

#### Using `create_xlsx_file`

- **Calling** 
         create_xlsx_from_csv(cvs_file_path, xlsx_file_path, optional_parameters)
         
- **The optional parameters**
  - **NOTE:** All optional parameters are `parameter_name=parameter_value`. There can be no spaces on either side of the equals sign (=).
  - The parameters:
    - `header`
    - `skiprows`
    - `sheet_name`
    - `usecols`  **REMEMBER:** header must be `header=None`

#### Example: No Column Name, `sheet_name` Is `"wow"`

In [None]:
create_xlsx_file_from_csv('Source_files/google.csv', 'Destination_files/google14.xlsx', 
                          header = None, skiprows = 1, sheet_name = 'wow')

#### Checking the Output
1. Open `Destination_files/google14.xlsx` in Excel and check the `sheet_name` and some of the data for accuracy.

# End of Notebook