# CSV Scraper
The process for this is almost identical to [web scraping](https://nbviewer.jupyter.org/github/rguo123/m2g-lims/blob/master/docs/Web_Scraper.ipynb). The only difference is we have to download all the links in order to parse the CSVs. 

This is very easy in Python. We can just call urllib.requests.urlretrieve() which is a built in library for Python 3. However, what's important to note is that our code needs a data_path to store the CSVs. In the github repository, we provide one for you in [data/csv/](https://github.com/rguo123/m2g-lims/tree/master/data/csv). If you want to store the CSVs somewhere else, you must edit the code in csv_scraper.py.

In [None]:
def download_csvs(csv_links, data_path):
    filenames = []
    for link in csv_links:
        filename = data_path + link.split('/')[-1]
        filenames.append(filename)
        urllib.request.urlretrieve(link, filename)
    return filenames

## Parsing CSVs
The only thing left to mention is how we parse the CSV data. It's all very straightforward and is completely dependent on another built-in Python Library: csv. However, we would like to mention that we toss out all metadata that does not have a value, which is denoted by a "#" in the csv file. This is just so we reduce clutter in our LIMS. However, this does result in uneven metadata among subjects in the same dataset and even for session scans for the same subject.  

We return the parsed data as a list of dictionaries where each dictionary is the metadata for a particular scan session.

In [None]:
def parse_csv(filenames):
    metadata_list = []
    for filename in filenames:
        with open(filename, 'r') as csvfile:
            reader = csv.reader(csvfile)
            # get metadata fields
            keys = next(reader)
            # remaining lines are subject metadata
            for row in reader:
                metadata = {}
                for i in range(len(row)):
                    ## skip if empty
                    if row[i] == "#":
                        continue
                    metadata[keys[i]] = row[i]
                metadata_list.append(metadata)

    return metadata_list