## CSV files

CSV files are one of the most commonly used file formats for sharing data.

In the `data` folder there is a subfolder called `iwCouncilSpending` that contains several CSV files.

## Using Unix commands to examine CSV file content

The `ls` command will list the contents of a folder, showing the CSV files and the file sizes in bytes (the number before the file date).

In [2]:
!ls -l data/iwCouncilSpending

total 2973
-rwxrwxrwx 1 vagrant vagrant 1526174 Jun 27  2014 PUBLISHED FORMAT - NOV 2013.csv
-rwxrwxrwx 1 vagrant vagrant 1514658 Dec 27  2015 csvFromWeb.csv
-rwxrwxrwx 1 vagrant vagrant    1457 Jun 27  2014 sample1.csv
-rwxrwxrwx 1 vagrant vagrant    1163 Jun 27  2014 sample2.csv


The file size tells you how large the file is and gives you a clue as to how many lines of data are in the file.

One of the easiest ways of previewing a CSV file is from the command line using the `head` command. By default this previews the first 10 lines of the file. If the name of the file you want to preview contains spaces, you can either escape them using a backslash, or put the whole filepath and filename into a string:

* `!head data/iwCouncilSpending/PUBLISHED FORMAT\ -\ NOV\ 2013.csv`
* `!head 'data/iwCouncilSpending/PUBLISHED FORMAT - NOV 2013.csv'`
   
The `-n` switch can be used to change the number of lines shown. 

For example, the following command previews the first 5 lines:

In [3]:
!head -n 5 'data/iwCouncilSpending/PUBLISHED FORMAT - NOV 2013.csv'

Capital or Revenue,Directorate,Transaction Number,Date,Service Area,Expenses Type,Amount,Supplier Name
Revenue,Community Wellbeing & Social Care,5105636098,13.11.2013,Public Libraries Central,Marketing Costs,200.00,REDACTED PERSONAL DATA
Revenue,Community Wellbeing & Social Care,5105635705,08.11.2013,Drug Misuse - Adults,Charges from Independent Providers,120.00,REDACTED PERSONAL DATA
Revenue,Childrens Services,5105637261,20.11.2013,Thompson House Tuition Centre (PRU),Professional Services,240.00,* M BOWDERY T/A SPOTLIGHT BOUTIQUE
Revenue,Community Wellbeing & Social Care,5105637069,27.11.2013,Safeguarding Adults,Professional Services,"5,285.00",REDACTED PERSONAL DATA


We can count the number of lines in the file using the `wc` command with the `-l` switch:

In [4]:
!wc -l 'data/iwCouncilSpending/PUBLISHED FORMAT - NOV 2013.csv'

11413 data/iwCouncilSpending/PUBLISHED FORMAT - NOV 2013.csv


The first number output is the number of lines, followed by the filename. How many lines are in the file you selected?

Remember, in a CSV file a data *row* may actually be split over several lines. This means that the line count is an upper bound on the number of rows of data that may be in the file.

It always makes sense to get a feel for the size of a file before you try to open it. The `head` command is a safe way of previewing the first few lines of the file. The `tail` command, conversely, allows you to preview the last few lines of the file. This can be useful if you want to know whether the file contains a blank line at the end.

In [5]:
!tail -n 5 'data/iwCouncilSpending/PUBLISHED FORMAT - NOV 2013.csv'

Revenue,Economy & Environment,GPC,31.10.2013,Play Development,Operational Equipment,127.99,GLENWAY PRODUCTS LTD
Revenue,Economy & Environment,GPC,04.11.2013,Play Development,Operational Equipment,91.99,GREENHAM TRADING LTD
Revenue,Economy & Environment,GPC,06.11.2013,Play Development,Travel Expenses,470.70,WIGHTLINK
Revenue,Economy & Environment,GPC,13.11.2013,Play Development,Operational Equipment,108.42,WWW.SPORTSFRONT.CO.UK
Revenue,Economy & Environment,GPC,14.11.2013,Sports Development - Admin,Training,12.50,WWW.RFU.COM


If we try to read in the CSV file a line at a time, we are likely to run into complications if the value of a cell contains a line break that splits a single data *row* over several *file* lines, so we would benefit from using a library for CSV file handling.

## CSV and _pandas_

The _pandas_ library provides a set of utilities for reading (and writing) CSV files that can cope with issues such as data rows split over more than one file line. The _pandas_ library also contains routines for working with CSV data, and it is these that we shall be using throughout the module.    (The library also provides a function that can read CSV files into a *dict* using the column header as dict keys.) 

In [6]:
import pandas as pd

In the `data/iwCouncilSpending/` directory there is a small file  - `sample1.csv` - containing a sample of rows (including a header row) from one of the spending data files. We can import that data into pandas in the following way.

In [8]:
pd.read_csv('data/iwCouncilSpending/sample1.csv')

Unnamed: 0,Capital or Revenue,Directorate,Transaction Number,Date,Service Area,Expenses Type,Amount,Supplier Name
0,Revenue,Community Wellbeing & Social Care,5105636098,13.11.2013,Public Libraries Central,Marketing Costs,200.0,REDACTED PERSONAL DATA
1,Revenue,Community Wellbeing & Social Care,5105635705,08.11.2013,Drug Misuse - Adults,Charges from Independent Providers,120.0,REDACTED PERSONAL DATA
2,Revenue,Childrens Services,5105637261,20.11.2013,Thompson House Tuition Centre (PRU),Professional Services,240.0,* M BOWDERY T/A SPOTLIGHT BOUTIQUE
3,Revenue,Community Wellbeing & Social Care,5105637069,27.11.2013,Safeguarding Adults,Professional Services,5285.0,REDACTED PERSONAL DATA
4,Revenue,Community Wellbeing & Social Care,5105637605,22.11.2013,Leaseholds by LA,Accommodation Costs - Leaseholder Payments,695.89,REDACTED PERSONAL DATA
5,Revenue,Community Wellbeing & Social Care,5105637605,22.11.2013,Leaseholds by LA,Accommodation Costs - Leaseholder Payments,695.89,REDACTED PERSONAL DATA
6,Revenue,Community Wellbeing & Social Care,5105637605,22.11.2013,Leaseholds by LA,Accommodation Costs - Leaseholder Payments,695.89,REDACTED PERSONAL DATA
7,Revenue,Community Wellbeing & Social Care,5105637605,22.11.2013,Leaseholds by LA,Accommodation Costs - Leaseholder Payments,695.89,REDACTED PERSONAL DATA
8,Revenue,Community Wellbeing & Social Care,5105637605,22.11.2013,Leaseholds by LA,Accommodation Costs - Leaseholder Payments,695.89,REDACTED PERSONAL DATA


### Exercise
The same directory also contains a file `sample2.csv`. In this exercise use the cells below to first preview the file and then load it in using the _pandas_ `read_csv()` function. What happens this time?

<div style="color:blue">

In [18]:
# Preview the sample2.csv file using the command line. 
#    What else can you learn about it from the command line?
!head -n 5 'data/iwCouncilSpending/sample2.csv'

Revenue,Community Wellbeing & Social Care,GPC,23.11.2013,Training - Childrens,Licences,28.96,SURVEYMONKEY.COM
Revenue,Community Wellbeing & Social Care,GPC,27.11.2013,Workforce Development - Early Years,Postage,6.95,POST OFFICE COUNTERS
Revenue,Resources,GPC,27.11.2013,Organisational Development � Leadership,Professional Services,28.90,WWW.PRINTEDPAPERPRODUCTS.CO.UK
Revenue,Community Wellbeing & Social Care,GPC,27.11.2013,Hub Coordinators,Travel Expenses,25.00,STAGECOACH SOUTH
Revenue,Community Wellbeing & Social Care,GPC,27.11.2013,Workforce Development -  Westridge Centre,Catering Purchases,30.58,TESCO STORE 5567
/bin/sh: 1: Syntax error: word unexpected (expecting ")")


In [25]:
# Use the pandas read_csv() function to load in sample2.csv. What happens?  
!pd.read_csv('data/iwCouncilSpending/sample2.csv')
#PT we can also use grep to look  for specific words or patterns in the file
!grep 'Early Years' 'data/iwCouncilSpending/sample2.csv'

/bin/sh: 1: Syntax error: word unexpected (expecting ")")
Revenue,Community Wellbeing & Social Care,GPC,27.11.2013,Workforce Development - Early Years,Postage,6.95,POST OFFICE COUNTERS


<div style="color:blue">CSV file fails to load

## Back to the pandas CSV 
If you have a large file, you may not want to load it into memory all at once. Instead, you might want to load the data in a row at a time, or a chunk of rows at a time. The `nrows` parameter allows you to define how many rows you want to load in from the start of the file, in much the same way as the command-line `head` command does.

In [26]:
pd.read_csv('data/iwCouncilSpending/sample1.csv', nrows=3)

Unnamed: 0,Capital or Revenue,Directorate,Transaction Number,Date,Service Area,Expenses Type,Amount,Supplier Name
0,Revenue,Community Wellbeing & Social Care,5105636098,13.11.2013,Public Libraries Central,Marketing Costs,200,REDACTED PERSONAL DATA
1,Revenue,Community Wellbeing & Social Care,5105635705,08.11.2013,Drug Misuse - Adults,Charges from Independent Providers,120,REDACTED PERSONAL DATA
2,Revenue,Childrens Services,5105637261,20.11.2013,Thompson House Tuition Centre (PRU),Professional Services,240,* M BOWDERY T/A SPOTLIGHT BOUTIQUE


Sometimes you might want to load in the whole file, but handle the rows in chunks. The `chunksize` parameter allows you to read data in from a file *chunksize* rows at a time.

In [27]:
chunks = pd.read_csv('data/iwCouncilSpending/sample1.csv', chunksize=4)
for chunk in chunks:
    print('New chunk...')
    print(chunk[ ['Transaction Number', 'Date', 'Amount'] ])

New chunk...
   Transaction Number        Date    Amount
0          5105636098  13.11.2013    200.00
1          5105635705  08.11.2013    120.00
2          5105637261  20.11.2013    240.00
3          5105637069  27.11.2013  5,285.00
New chunk...
   Transaction Number        Date  Amount
0          5105637605  22.11.2013  695.89
1          5105637605  22.11.2013  695.89
2          5105637605  22.11.2013  695.89
3          5105637605  22.11.2013  695.89
New chunk...
   Transaction Number        Date  Amount
0          5105637605  22.11.2013  695.89


In many cases, you may find that the data you want to work with is available as a data file on the web. You could download such a file using a web browser, or by any other means, and then load your downloaded copy of the file into the Notebook. Or, you can load a file directly into the Notebook from a web URL using pandas' `read_csv()` function - simply use the file's URL as the filename. 

In [28]:
csvFromWeb = pd.read_csv('http://www.iwight.com/documentlibrary/download/november-2013-transparency-data-csv')
csvFromWeb[:5]

URLError: <urlopen error [Errno -2] Name or service not known>

### Writing DataFrames to CSV files

To write the data in a DataFrame out to a CSV file, we can use the `to_csv()` function with a supplied target filename.

In [30]:
# Write the data frame to the named file.
# PT: as i don't have an internet connection I'll just pretend I was able to load data from the url to the csvFromWeb file
csvFromWeb = pd.read_csv('data/iwCouncilSpending/sample1.csv')
csvFromWeb.to_csv('data/iwCouncilSpending/csvFromWeb.csv')
# Show the first 5 lines of the newly written file.
pd.read_csv('data/iwCouncilSpending/csvFromWeb.csv', nrows=5)

Unnamed: 0.1,Unnamed: 0,Capital or Revenue,Directorate,Transaction Number,Date,Service Area,Expenses Type,Amount,Supplier Name
0,0,Revenue,Community Wellbeing & Social Care,5105636098,13.11.2013,Public Libraries Central,Marketing Costs,200.0,REDACTED PERSONAL DATA
1,1,Revenue,Community Wellbeing & Social Care,5105635705,08.11.2013,Drug Misuse - Adults,Charges from Independent Providers,120.0,REDACTED PERSONAL DATA
2,2,Revenue,Childrens Services,5105637261,20.11.2013,Thompson House Tuition Centre (PRU),Professional Services,240.0,* M BOWDERY T/A SPOTLIGHT BOUTIQUE
3,3,Revenue,Community Wellbeing & Social Care,5105637069,27.11.2013,Safeguarding Adults,Professional Services,5285.0,REDACTED PERSONAL DATA
4,4,Revenue,Community Wellbeing & Social Care,5105637605,22.11.2013,Leaseholds by LA,Accommodation Costs - Leaseholder Payments,695.89,REDACTED PERSONAL DATA


If you examine the output you will see that by default, the index column and header row will be written out - but, both can be disabled.

In [31]:
csvFromWeb.to_csv('data/iwCouncilSpending/csvFromWeb.csv', index=False, header=False)
pd.read_csv('data/iwCouncilSpending/csvFromWeb.csv', nrows=5)

Unnamed: 0,Revenue,Community Wellbeing & Social Care,5105636098,13.11.2013,Public Libraries Central,Marketing Costs,200.00,REDACTED PERSONAL DATA
0,Revenue,Community Wellbeing & Social Care,5105635705,08.11.2013,Drug Misuse - Adults,Charges from Independent Providers,120.0,REDACTED PERSONAL DATA
1,Revenue,Childrens Services,5105637261,20.11.2013,Thompson House Tuition Centre (PRU),Professional Services,240.0,* M BOWDERY T/A SPOTLIGHT BOUTIQUE
2,Revenue,Community Wellbeing & Social Care,5105637069,27.11.2013,Safeguarding Adults,Professional Services,5285.0,REDACTED PERSONAL DATA
3,Revenue,Community Wellbeing & Social Care,5105637605,22.11.2013,Leaseholds by LA,Accommodation Costs - Leaseholder Payments,695.89,REDACTED PERSONAL DATA
4,Revenue,Community Wellbeing & Social Care,5105637605,22.11.2013,Leaseholds by LA,Accommodation Costs - Leaseholder Payments,695.89,REDACTED PERSONAL DATA


Ooops! `read_csv()` expects a header line, but we chose not to write one.

If we know there is no header line, then we need to declare this when reading the data in.

In [32]:
pd.read_csv('data/iwCouncilSpending/csvFromWeb.csv', nrows=5, header=None)

Unnamed: 0,0,1,2,3,4,5,6,7
0,Revenue,Community Wellbeing & Social Care,5105636098,13.11.2013,Public Libraries Central,Marketing Costs,200.0,REDACTED PERSONAL DATA
1,Revenue,Community Wellbeing & Social Care,5105635705,08.11.2013,Drug Misuse - Adults,Charges from Independent Providers,120.0,REDACTED PERSONAL DATA
2,Revenue,Childrens Services,5105637261,20.11.2013,Thompson House Tuition Centre (PRU),Professional Services,240.0,* M BOWDERY T/A SPOTLIGHT BOUTIQUE
3,Revenue,Community Wellbeing & Social Care,5105637069,27.11.2013,Safeguarding Adults,Professional Services,5285.0,REDACTED PERSONAL DATA
4,Revenue,Community Wellbeing & Social Care,5105637605,22.11.2013,Leaseholds by LA,Accommodation Costs - Leaseholder Payments,695.89,REDACTED PERSONAL DATA


To add headers to an unheaded CSV file, we can use the `names` parameter to `read_csv()`. 

We can get a list of the column names from the original file using `columns.values.tolist()`. 

In [33]:
origNames = csvFromWeb.columns.values.tolist()
origNames

['Capital or Revenue',
 'Directorate',
 'Transaction Number',
 'Date',
 'Service Area',
 'Expenses Type',
 'Amount',
 'Supplier Name']

In [42]:
!head 'data/iwCouncilSpending/csvFromWeb.csv' -n 1

Revenue,Community Wellbeing & Social Care,5105636098,13.11.2013,Public Libraries Central,Marketing Costs,200.00,REDACTED PERSONAL DATA


In [43]:
pd.read_csv('data/iwCouncilSpending/csvFromWeb.csv', nrows=5, header=None, names=origNames)

Unnamed: 0,Capital or Revenue,Directorate,Transaction Number,Date,Service Area,Expenses Type,Amount,Supplier Name
0,Revenue,Community Wellbeing & Social Care,5105636098,13.11.2013,Public Libraries Central,Marketing Costs,200.0,REDACTED PERSONAL DATA
1,Revenue,Community Wellbeing & Social Care,5105635705,08.11.2013,Drug Misuse - Adults,Charges from Independent Providers,120.0,REDACTED PERSONAL DATA
2,Revenue,Childrens Services,5105637261,20.11.2013,Thompson House Tuition Centre (PRU),Professional Services,240.0,* M BOWDERY T/A SPOTLIGHT BOUTIQUE
3,Revenue,Community Wellbeing & Social Care,5105637069,27.11.2013,Safeguarding Adults,Professional Services,5285.0,REDACTED PERSONAL DATA
4,Revenue,Community Wellbeing & Social Care,5105637605,22.11.2013,Leaseholds by LA,Accommodation Costs - Leaseholder Payments,695.89,REDACTED PERSONAL DATA


To read in just a specific set of columns, use the `usecols` parameter.

In [44]:
tmp = pd.read_csv('data/iwCouncilSpending/sample1.csv', nrows=3, usecols=['Date','Transaction Number','Amount'])
tmp

Unnamed: 0,Transaction Number,Date,Amount
0,5105636098,13.11.2013,200
1,5105635705,08.11.2013,120
2,5105637261,20.11.2013,240


For more information on `read_csv()`, see the *pandas* documentation: [pandas.io.parsers.read_csv](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html).

For more information on `DataFrame.to_csv()`, see the *pandas* documentation for  [pandas.DataFrame.to_csv](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html).

### Typing data in columns

In many cases a CSV file will contain columns of data that conform to a particular type, although the type may not be detected when the data is simply read in.

In [45]:
# We need to specify the encoding of this file or it will not be opened.
# A good guess for the filetype of a CSV file that does not open with 
#    the default settings is ISO-8859-1 or equivalently latin-1.

#For some notes on how to try to find what encoding a particular file is
#see the 02.2.0 Data file formats - file encodings Notebook.

df = pd.read_csv('data/iwCouncilSpending/PUBLISHED FORMAT - NOV 2013.csv', encoding="ISO-8859-1")
df.dtypes

Capital or Revenue    object
Directorate           object
Transaction Number    object
Date                  object
Service Area          object
Expenses Type         object
Amount                object
Supplier Name         object
dtype: object

Another useful test to run on one or more columns is a summary review of the unique values that exist within a column.

In [46]:
uniquevalues = df['Directorate'].unique()
uniquevalues

array(['Community Wellbeing & Social Care', 'Childrens Services',
       'Economy & Environment', 'Resources', 'Corporate'], dtype=object)

In [54]:
#So i'm guessing if you wanted to check that all values in a column were unique you could do something like
allColumnValsUnique = len(df['Transaction Number']) == len(df['Transaction Number'].unique())
allColumnValsUnique

False

In [55]:
# This will fail with an error message
pd.to_numeric(df['Amount'])

ValueError: Unable to parse string

In [56]:
# Do the same again, but this time tell the to_numeric() function 
# to ignore any errors while it attempts the conversion
pd.to_numeric(df['Amount'], errors='ignore') 

0          200.00
1          120.00
2          240.00
3        5,285.00
4          695.89
5          695.89
6          695.89
7          695.89
8          695.89
9          216.90
10         495.00
11         826.50
12         383.50
13         120.00
14          50.00
15         120.00
16          30.00
17          25.00
18         310.00
19          65.00
20         310.00
21         155.00
22          25.00
23          60.00
24          80.00
25         620.00
26         100.00
27          80.00
28         100.00
29         100.00
           ...   
11382       18.05
11383        5.10
11384       48.70
11385       47.60
11386        8.50
11387        4.00
11388       39.81
11389        9.99
11390       34.21
11391        3.96
11392        7.07
11393       48.70
11394       32.00
11395      157.80
11396        4.00
11397       28.99
11398       28.99
11399       28.99
11400       28.99
11401       28.99
11402       28.96
11403        6.95
11404       28.90
11405       25.00
11406     

Unfortunately, the comma in the value `5,285.00` in the third row means this entry is seen as a string that cannot be simply converted to a number.

What happens if we try a more direct route, saying we specifically want to cast the values in the column to a `float` type?

A solution to this is to get rid of the comma using a `str.replace()` function applied to each value in the column, and then use the `to_numeric()` function:

In [57]:
pd.to_numeric(df['Amount'].str.replace(',',''))

0         200.00
1         120.00
2         240.00
3        5285.00
4         695.89
5         695.89
6         695.89
7         695.89
8         695.89
9         216.90
10        495.00
11        826.50
12        383.50
13        120.00
14         50.00
15        120.00
16         30.00
17         25.00
18        310.00
19         65.00
20        310.00
21        155.00
22         25.00
23         60.00
24         80.00
25        620.00
26        100.00
27         80.00
28        100.00
29        100.00
          ...   
11382      18.05
11383       5.10
11384      48.70
11385      47.60
11386       8.50
11387       4.00
11388      39.81
11389       9.99
11390      34.21
11391       3.96
11392       7.07
11393      48.70
11394      32.00
11395     157.80
11396       4.00
11397      28.99
11398      28.99
11399      28.99
11400      28.99
11401      28.99
11402      28.96
11403       6.95
11404      28.90
11405      25.00
11406      30.58
11407     127.99
11408      91.99
11409     470.

However, there is an easier way. With delimiters commonly being used to separate 'thousands' in many number strings, _pandas_ usefully provides a way of automatically handling these as the CSV file is read. Specifically, you can handle the separator automatically by setting the `thousands` parameter appropriately.

In [58]:
df = pd.read_csv('data/iwCouncilSpending/PUBLISHED FORMAT - NOV 2013.csv', thousands=',',
                  encoding="latin-1")
df.dtypes

Capital or Revenue     object
Directorate            object
Transaction Number     object
Date                   object
Service Area           object
Expenses Type          object
Amount                float64
Supplier Name          object
dtype: object

If the `read_csv()` function can identify a column type unambiguously, it will do so. For example, if we just use the first few lines of the CSV file, we see how the `Amount` column values can all be cast directly to an appropriate numeric type without error and as a result the `read_csv()` function does cast the column type automatically.

In [59]:
df = pd.read_csv('data/iwCouncilSpending/PUBLISHED FORMAT - NOV 2013.csv', nrows=3)
df.dtypes

Capital or Revenue     object
Directorate            object
Transaction Number      int64
Date                   object
Service Area           object
Expenses Type          object
Amount                float64
Supplier Name          object
dtype: object

Notice that the file loaded that time without an error, but without setting the encoding. 
Why? 

We only loaded in the first three lines - presumably the encoded characters that were causing the problem are not in these three lines.

We can also try to force the type using the `dtype` parameter, though if an illegal cast is attempted an error will be raised.

In [60]:
df = pd.read_csv('data/iwCouncilSpending/PUBLISHED FORMAT - NOV 2013.csv',
                  nrows=3, dtype={'Transaction Number':float})
df.dtypes

Capital or Revenue     object
Directorate            object
Transaction Number    float64
Date                   object
Service Area           object
Expenses Type          object
Amount                float64
Supplier Name          object
dtype: object

### Parsing dates

One column type that appears in many datasets is dates, or dates and times, although the way in which dates are actually presented may vary widely from dataset to dataset. For example, 12/3/14, 12-Mar-2014 and 2014-03-12 all represent the same date. If we specify the name of a date column, _pandas_ `read_csv()` will try to automatically detect the corresponding date representation.  

In [61]:
df = pd.read_csv('data/iwCouncilSpending/PUBLISHED FORMAT - NOV 2013.csv',
                 parse_dates=['Date'], encoding="latin-1")
df.dtypes

Capital or Revenue            object
Directorate                   object
Transaction Number            object
Date                  datetime64[ns]
Service Area                  object
Expenses Type                 object
Amount                        object
Supplier Name                 object
dtype: object

Sometimes there may be ambiguity about whether the day or month is provided first. For example is 1/2/13 the 1st of February or a US styled 2nd of January? Specifying `dayfirst=True` clearly identifies the first convention should be assumed.

A date format can also be declared explicitly - to see how, check the documentation: [pandas.io.parsers.read_csv](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html).

## Summary
In this Notebook you have seen how to:
1. use a range of Unix commands to find out what is in a file
2. use *pandas* to read a CSV file
3. write data in a DataFrame to a CSV file
4. examine the datatypes and parse dates when reading a CSV file
5. You've also been reminded where the *pandas* and Dataframe documentation can be found.


## What next?

If you are working through this Notebook as part of an inline exercise, return to the module materials now.

If you are working through this set of Notebooks as a whole, move on to look at `02.2.2 Data File Formats - JSON`. 