#To run, you must follow the directions below:
-
Install python3. You can follow the directions on https://realpython.com/installing-python/ for help getting set up.
-
Run this command in your command line (be sure to follow steps in step 1 for installing pip3): pip3 install requirements.txt
-
Run the program on your input and have fun :) ./normalizer < resources/sample.csv > output.csv
This is a CSV normalizer that accepts a csv from stdin and outputs it to stdout. Normalized, in this case, means:
- The entire CSV is in the UTF-8 character set.
- The
Timestampcolumn should be formatted in RFC3339 format. - The
Timestampcolumn should be assumed to be in US/Pacific time; please convert it to US/Eastern. - All
ZIPcodes should be formatted as 5 digits. If there are less than 5 digits, assume 0 as the prefix. - The
FullNamecolumn should be converted to uppercase. There will be non-English names. - The
Addresscolumn should be passed through as is, except for Unicode validation. Please note there are commas in the Address field; your CSV parsing will need to take that into account. Commas will only be present inside a quoted string. - The
FooDurationandBarDurationcolumns are in HH:MM:SS.MS format (where MS is milliseconds); please convert them to the total number of seconds. - The
TotalDurationcolumn is filled with garbage data. For each row, please replace the value ofTotalDurationwith the sum ofFooDurationandBarDuration. - The
Notescolumn is free form text input by end-users; please do not perform any transformations on this column. If there are invalid UTF-8 characters, please replace them with the Unicode Replacement Character.
Safe Assumptions:
- The input document is in UTF-8, although some characters may be incorrectly encoded.
- Invalid characters can be replaced with the Unicode Replacement Character. If that replacement makes data invalid (for example, because it turns a date field into something unparseable), print a warning to
stderrand drop the row from your output. - Times that are missing timezone information are in
US/Pacific. - The sample data we provide contains all date and time format variants you will need to handle.
- Any type of line endings are permissible in the output.