# Challenge 001 - Read CSV File

This challenge is taken from test coding skills from Carto (https://carto.com/careers/).
It is to solve "Processing a large file with Python".

## Problem
Build the following and make it run as fast as you possibly can using Python 3 (vanilla). The faster it runs, the more you will impress us!

Your code should:

- Download this ~2GB file: https://s3.amazonaws.com/carto-1000x/data/yellow_tripdata_2016-01.csv
- Count the lines in the file
- Calculate the average value of the tip_amount field.
- All of that in the most efficient way you can come up with.

That's it. Make it fly!

(Source https://gist.github.com/jorgesancha/2a8027e5a89a2ea1693d63a45afdd8b6)

## Solution

> NOTE: I downloaded the CSV file beforehand and stored it in local machine. So, time to download is not included. If you want to try it in your own machine, please follow what I did and replace the file name in this code.

### Using Pandas

In [None]:
import pandas as pd
from datetime import datetime

start = datetime.now() 

# Read CSV file and store records into Pandas dataframe
df = pd.read_csv("Test.csv")

# Get number of rows
rows = df.shape[0]

# Calculate average for "tip_amount" field
average = df["tip_amount"].mean()
end = datetime.now() 
elapsed = end - start

print("Number of line in file: {}".format(rows))
print("Average: {}".format(average))
print("Time : {}".format(elapsed.total_seconds()))

### Using Python (vannila)

In [None]:
start = datetime.now()     
data = None
with open ("Test.csv", "r", encoding='utf-8') as file:
    # Enumerate records in CSV File
    data = enumerate(file)
    
    # Get first row which is the header
    index, header = next(data)
    
    # Get index value for "tip_amount" field
    field_index = header.split(",").index('tip_amount')
    count = 0
    
    # Iterate records and get value of "tip_amount"
    for i, line in data:
        count += float(line.split(",")[field_index])

    # Calculate average
    average = count/i
    end = datetime.now() 
    elapsed = end - start

    print("Number of line in file: {}".format(i))
    print("Average: {}".format(average))
    print("Time : {}".format(elapsed.total_seconds()))