# ETL Processes

**Use this notebook to perform the full ETL (Extract, Transform, Load) process and populate your database with data from the provided datasets.**  
The notebook is divided into two parts:

### 1. Mock-up Process  
In this section, you will explore the datasets and experiment with transforming the data. Your goal is to write `INSERT` queries that insert **a single sample record** into each relevant table. This helps you understand the structure and logic needed before scaling up to the full dataset.

### 2. Completed Process  
Once your `INSERT` queries work correctly for a single record, you'll clean up the database tables and reuse those queries to **load all the data** into your database.

## Part 1. Mock-up Process

In [1]:
import os
import glob
import psycopg2
import pandas as pd

In [2]:
conn = psycopg2.connect("host=127.0.0.1 dbname=sparkifydb user=student password=student")
cur = conn.cursor()

In [3]:
def get_files(filepath):
    all_files = []
    for root, dirs, files in os.walk(filepath):
        files = glob.glob(os.path.join(root,'*.json'))
        for f in files :
            all_files.append(os.path.abspath(f))
    
    return all_files

# Process `song_data`
In this first part, you'll perform ETL on the first dataset, `song_data`, to create the `songs` and `artists` dimensional tables.

Let's perform ETL on a single song file and load a single record into each table to start.
- Use the `get_files` function provided above to get a list of all song JSON files in `data/song_data`
- Select the first song in this list
- Read the song file and view the data

In [4]:
# TODO: Get the first file from data/song_data directory and parse it into
#       a df object.
song_files = get_files('data/song_data')
filepath = song_files[0]
df = pd.read_json(filepath, lines=True)
df.head()

Unnamed: 0,artist_id,artist_latitude,artist_location,artist_longitude,artist_name,duration,num_songs,song_id,title,year
0,ARNTLGG11E2835DDB9,,,,Clp,266.39628,1,SOUDSGM12AC9618304,Insatiable (Instrumental Version),0


## #1: `songs` Table
#### Extract Data for Songs Table
- Select columns for song ID, title, artist ID, year, and duration
- Use `df.values` to select just the values from the dataframe
- Index to select the first (only) record in the dataframe
- Convert the array to a list and set it to `song_data`

In [5]:
# TODO: Select the first record's values, convert them to a list.
song_data = df[['song_id', 'title', 'artist_id', 'year', 'duration']].values[0].tolist()
song_data

['SOUDSGM12AC9618304',
 'Insatiable (Instrumental Version)',
 'ARNTLGG11E2835DDB9',
 0,
 266.39628]

#### Insert Record into Song Table

Write the `song_table_insert` query and run the cell below to insert a new song record into the `songs` table.

Make sure your query handles potential duplicate `song_id` values gracefully—if a duplicate is encountered, the query should **do nothing** instead of raising an error.

In [6]:
# TODO: Write an INSERT query for the songs table that accepts
#       record values as its parameters.
song_table_insert = ("""
    INSERT INTO songs(song_id, title, artist_id, year, duration)
    VALUES(%s, %s, %s, %s, %s)
    ON CONFLICT (song_id) DO NOTHING;
""")

**Note:** If you get the following error when running the next code block:

> InternalError: current transaction is aborted, commands ignored until end of transaction block

This error happens because **a previous SQL command failed**, and the current transaction is now in an aborted state. In PostgreSQL, once a transaction fails, you must **roll it back** before executing any further commands.

To fix this, create a new code block and run this code in it:

```
conn.rollback()
```

After rolling back, you may delete the code block and safely re-run your query.

This applies to all other `cur.execute()` code in this notebook.

In [7]:
cur.execute(song_table_insert, song_data)
conn.commit()

### Test

Run this test code block to verify that a record was successfully inserted into the table.
If everything is working correctly, the output should display the same record you inserted in the previous step.

In [8]:
cur.execute("SELECT * FROM songs LIMIT 5")
results = cur.fetchall()
results

[('SOUDSGM12AC9618304',
  'Insatiable (Instrumental Version)',
  'ARNTLGG11E2835DDB9',
  0,
  266.39628),
 ('SOMJBYD12A6D4F8557',
  'Keepin It Real (Skit)',
  'ARD0S291187B9B7BF5',
  0,
  114.78159),
 ('SOMZWCG12A8C13C480',
  "I Didn't Mean To",
  'ARD7TVE1187B99BFB1',
  0,
  218.93179),
 ('SOFSOCN12A8C143F5D',
  'Face the Ashes',
  'ARXR32B1187FB57099',
  2007,
  209.60608),
 ('SONHOTT12A8C13493C',
  'Something Girls',
  'AR7G5I41187FB4CE6C',
  1982,
  233.40363)]

## #2: `artists` Table
#### Extract Data for Artists Table
- Select columns for artist ID, name, location, latitude, and longitude
- Use `df.values` to select just the values from the dataframe
- Index to select the first (only) record in the dataframe
- Convert the array to a list and set it to `artist_data`

In [9]:
# TODO: Select the first record's values, convert them to a list
artist_data = df[['artist_id', 'artist_name', 'artist_location', 'artist_latitude', 'artist_longitude']].values[0].tolist()
artist_data

['ARNTLGG11E2835DDB9', 'Clp', '', nan, nan]

#### Insert Record into Artist Table

Write the `artist_table_insert` query and run the cell below to insert a new artist record into the `artists` table.

Make sure your query handles potential duplicate `artist_id` values gracefully—if a duplicate is encountered, the query should **do nothing** instead of raising an error.

In [10]:
# TODO: Write an INSERT query for the artists table that accepts
#       record values as its parameters.
artist_table_insert = ("""
    INSERT INTO artists(artist_id, name, location, latitude, longitude)
    VALUES(%s, %s, %s, %s, %s)
    ON CONFLICT (artist_id) DO NOTHING;
""")

In [11]:
cur.execute(artist_table_insert, artist_data)
conn.commit()

### Test

In [12]:
cur.execute("SELECT * FROM artists LIMIT 5")
results = cur.fetchall()
results

[('ARNTLGG11E2835DDB9', 'Clp', '', nan, nan),
 ('ARD0S291187B9B7BF5', 'Rated R', 'Ohio', nan, nan),
 ('ARD7TVE1187B99BFB1', 'Casual', 'California - LA', nan, nan),
 ('ARXR32B1187FB57099', 'Gob', '', nan, nan),
 ('AR7G5I41187FB4CE6C', 'Adam Ant', 'London, England', nan, nan)]

# Process `log_data`
In this part, you'll perform ETL on the second dataset, `log_data`, to create the `time` and `users` dimensional tables, as well as the `songplays` fact table.

Let's perform ETL on a single log file and load records into each table.
- Use the `get_files` function provided above to get a list of all log JSON files in `data/log_data`
- Select the first log file in this list
- Read the log file and view the data

In [13]:
# TODO: Get the first file from data/log_data directory and parse it into
#       a df object.
log_files = get_files('data/log_data')
filepath = log_files[0]
df = pd.read_json(filepath, lines=True)
df.head()

Unnamed: 0,artist,auth,firstName,gender,itemInSession,lastName,length,level,location,method,page,registration,sessionId,song,status,ts,userAgent,userId
0,Mitch Ryder & The Detroit Wheels,Logged In,Tegan,F,65,Levine,205.03465,paid,"Portland-South Portland, ME",PUT,NextSong,1540794000000.0,992,Jenny Take A Ride (LP Version),200,1543363215796,"""Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4...",80
1,The Spill Canvas,Logged In,Tegan,F,66,Levine,358.03383,paid,"Portland-South Portland, ME",PUT,NextSong,1540794000000.0,992,The TIde (LP Version),200,1543363420796,"""Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4...",80
2,Mogwai,Logged In,Tegan,F,67,Levine,571.19302,paid,"Portland-South Portland, ME",PUT,NextSong,1540794000000.0,992,Two Rights Make One Wrong,200,1543363778796,"""Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4...",80
3,Spor,Logged In,Tegan,F,68,Levine,380.3424,paid,"Portland-South Portland, ME",PUT,NextSong,1540794000000.0,992,Way Of The Samurai,200,1543364349796,"""Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4...",80
4,DJ Dizzy,Logged In,Tegan,F,69,Levine,221.1522,paid,"Portland-South Portland, ME",PUT,NextSong,1540794000000.0,992,Sexy Bitch,200,1543364729796,"""Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4...",80


## #3: `time` Table
#### Extract Data for Time Table
- Filter records by `NextSong` action
- Convert the `ts` timestamp column to datetime
  - Hint: the current timestamp is in milliseconds
- Extract the timestamp, hour, day, week of year, month, year, and weekday from the `ts` column and set `time_data` to a list containing these values in order
  - Hint: use pandas' [`dt` attribute](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.html) to access easily datetimelike properties.
- Specify labels for these columns and set to `column_labels`
- Create a dataframe, `time_df,` containing the time data for this file by combining `column_labels` and `time_data` into a dictionary and converting this into a dataframe

In [14]:
# TODO: Filter records by NextSong action
df = df[df['page'] == 'NextSong']
df.head()

Unnamed: 0,artist,auth,firstName,gender,itemInSession,lastName,length,level,location,method,page,registration,sessionId,song,status,ts,userAgent,userId
0,Mitch Ryder & The Detroit Wheels,Logged In,Tegan,F,65,Levine,205.03465,paid,"Portland-South Portland, ME",PUT,NextSong,1540794000000.0,992,Jenny Take A Ride (LP Version),200,1543363215796,"""Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4...",80
1,The Spill Canvas,Logged In,Tegan,F,66,Levine,358.03383,paid,"Portland-South Portland, ME",PUT,NextSong,1540794000000.0,992,The TIde (LP Version),200,1543363420796,"""Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4...",80
2,Mogwai,Logged In,Tegan,F,67,Levine,571.19302,paid,"Portland-South Portland, ME",PUT,NextSong,1540794000000.0,992,Two Rights Make One Wrong,200,1543363778796,"""Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4...",80
3,Spor,Logged In,Tegan,F,68,Levine,380.3424,paid,"Portland-South Portland, ME",PUT,NextSong,1540794000000.0,992,Way Of The Samurai,200,1543364349796,"""Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4...",80
4,DJ Dizzy,Logged In,Tegan,F,69,Levine,221.1522,paid,"Portland-South Portland, ME",PUT,NextSong,1540794000000.0,992,Sexy Bitch,200,1543364729796,"""Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4...",80


In [15]:
# TODO: Convert the ts timestamp column to datetime
t = pd.to_datetime(df['ts'], unit='ms')
t.head()

0   2018-11-28 00:00:15.796
1   2018-11-28 00:03:40.796
2   2018-11-28 00:09:38.796
3   2018-11-28 00:19:09.796
4   2018-11-28 00:25:29.796
Name: ts, dtype: datetime64[ns]

In [16]:
# TODO: Extract the timestamp, hour, day, week of year, month, year, and  
#       weekday from the ts column and set time_data to a list containing 
#       these values in order
time_data = (df['ts'], t.dt.hour, t.dt.day, t.dt.week, t.dt.month, t.dt.year, t.dt.weekday)
column_labels = ('start_time', 'hour', 'day', 'week', 'month', 'year', 'weekday')

In [17]:
# TODO: Create a dataframe containing the time data for this file by 
#       combining column_labels and time_data into a dictionary and 
#       converting this into a dataframe
time_df = pd.DataFrame(data=dict(zip(column_labels, time_data)))
time_df.head()

Unnamed: 0,start_time,hour,day,week,month,year,weekday
0,1543363215796,0,28,48,11,2018,2
1,1543363420796,0,28,48,11,2018,2
2,1543363778796,0,28,48,11,2018,2
3,1543364349796,0,28,48,11,2018,2
4,1543364729796,0,28,48,11,2018,2


#### Insert Records into Time Table
Write the `time_table_insert` query and run the cell below to insert the time records into the `time` table.

Make sure your query handles potential duplicate `start_time` values gracefully—if a duplicate is encountered, the query should **do nothing** instead of raising an error.

In [18]:
# TODO: Write an INSERT query for the time table that accepts
#       record values as its parameters.
time_table_insert = ("""
    INSERT INTO time(start_time, hour, day, week, month, year, weekday)
    VALUES(%s, %s, %s, %s, %s, %s, %s)
    ON CONFLICT (start_time) DO NOTHING;
""")

In [19]:
for i, row in time_df.iterrows():
    cur.execute(time_table_insert, list(row))
    conn.commit()

### Test

In [20]:
cur.execute("SELECT * FROM time LIMIT 5")
results = cur.fetchall()
results

[(1543363215796, 0, 28, 48, 11, 2018, 2),
 (1543363420796, 0, 28, 48, 11, 2018, 2),
 (1543363778796, 0, 28, 48, 11, 2018, 2),
 (1543364349796, 0, 28, 48, 11, 2018, 2),
 (1543364729796, 0, 28, 48, 11, 2018, 2)]

## #4: `users` Table
#### Extract Data for Users Table
- Select columns for user ID, first name, last name, gender and level and set to `user_df`

In [21]:
# TODO: Create the user_df dataframe
user_df = df[['userId', 'firstName', 'lastName', 'gender', 'level']]

#### Insert Records into Users Table

Write the `user_table_insert` query and run the cell below to insert user records into the `users` table.

**Important: This part is a bit different from previous queries**

Make sure your query handles potential duplicate `user_id` values gracefully—if a duplicate is encountered, the query should **UPDATE the `level` field to the new record's `level` field.** 

Learn more about the ON CONFLICT clause [here](https://www.postgresql.org/docs/current/sql-insert.html) and see some examples on how to do this.

In [22]:
# TODO: Write an INSERT query for the users table that accepts
#       record values as its parameters and updates level on conflict.
user_table_insert = ("""
    INSERT INTO users(user_id, first_name, last_name, gender, level)
    VALUES(%s, %s, %s, %s, %s)
    ON CONFLICT (user_id) DO UPDATE set level = EXCLUDED.level;
""")

In [23]:
for i, row in user_df.iterrows():
    cur.execute(user_table_insert, row)
    conn.commit()

### Test

In [24]:
cur.execute("SELECT * FROM users LIMIT 5")
results = cur.fetchall()
results

[(92, 'Ryann', 'Smith', 'F', 'free'),
 (74, 'Braden', 'Parker', 'M', 'free'),
 (55, 'Martin', 'Johnson', 'M', 'free'),
 (40, 'Tucker', 'Garrison', 'M', 'free'),
 (9, 'Wyatt', 'Scott', 'M', 'free')]

## #5: `songplays` Table
#### Extract Data and Songplays Table
This one is a little more complicated since information from the songs table, artists table, and original log file are all needed for the `songplays` table. Since the log file does not specify an ID for either the song or the artist, you'll need to get the song ID and artist ID by querying the songs and artists tables to find matches based on song title, artist name, and song duration time.
- Implement the `song_select` query in `sql_queries.py` to find the song ID and artist ID based on the title, artist name, and duration of a song.
- Select the timestamp, user ID, level, song ID, artist ID, session ID, location, and user agent and set to `songplay_data`

#### Insert Records into Songplays Table
- Create the `songplay_table_insert` query and complete & run the cell below it to insert records for the songplay actions in this log file into the `songplays` table.

In [25]:
# TODO: Write an INSERT query for the songplays table that accepts
#       record values as its parameters.
songplay_table_insert = ("""
    INSERT INTO songplays(
        start_time,
        user_id,
        level,
        song_id,
        artist_id,
        session_id,
        location,
        user_agent)
    VALUES(%s, %s, %s, %s, %s, %s, %s, %s);
""")

# TODO: Write a SELECT query that finds the song ID and artist ID 
#       based on the title, artist name, and duration of a song.
song_select = ("""
    SELECT s.song_id, s.artist_id
    FROM songs s 
        INNER JOIN artists a ON s.artist_id = a.artist_id
    WHERE s.title = %s AND a.name = %s AND s.duration = %s;
""")

In [26]:
for index, row in df.iterrows():

    # get songid and artistid from song and artist tables
    cur.execute(song_select, (row.song, row.artist, row.length))
    results = cur.fetchone()
    
    if results:
        songid, artistid = results
    else:
        songid, artistid = None, None

    # TODO: Prepare the values for each songplay record
    songplay_data = (row['ts'],
                     row['userId'],
                     row['level'],
                     songid, artistid,
                     row['sessionId'],
                     row['location'],
                     row['userAgent'])
    cur.execute(songplay_table_insert, songplay_data)
    conn.commit()

### Test


In [27]:
cur.execute("SELECT * FROM songplays LIMIT 5")
results = cur.fetchall()
results

[(7184,
  1543363215796,
  80,
  'paid',
  None,
  None,
  992,
  'Portland-South Portland, ME',
  '"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36"'),
 (7185,
  1543363420796,
  80,
  'paid',
  None,
  None,
  992,
  'Portland-South Portland, ME',
  '"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36"'),
 (7186,
  1543363778796,
  80,
  'paid',
  None,
  None,
  992,
  'Portland-South Portland, ME',
  '"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36"'),
 (7187,
  1543364349796,
  80,
  'paid',
  None,
  None,
  992,
  'Portland-South Portland, ME',
  '"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36"'),
 (7188,
  1543364729796,
  80,
  'paid',
  None,
  None,
  992,
  'Portland-South Portland, ME',

### Close Connection to Sparkify Database

In [28]:
conn.close()

## Part 2. Completed Process

Now that you've written all the `*_table_insert` queries and built the code to prepare and insert individual records into your database, it's time to **formalize the ETL process**.

In this step, you will:
- Create reusable **functions** to process all files from the **song** and **log** datasets.
- Use these functions to **load the full datasets** into your database.

**Hint:** Read the code you've written in **step 1** and copy the relevant parts from each section.

By the end of this section, your database should be fully populated with all available records from both datasets.

> 💡 **Note:**  
> Be sure to include **docstrings** for each function you write. Clear documentation helps others (and your future self) understand how each function works and how to use it effectively.

In [29]:
# TODO: Complete both of these functions below based on what you've 
#       learned so far.

def process_song_file(cur, filepath):
    """ TODO: Write a docstring documentation here.
    """
    # open song file
    df = pd.read_json(filepath, lines=True)

    # insert song record
    song_data = df[['song_id', 'title', 'artist_id', 'year', 'duration']].values[0].tolist()
    cur.execute(song_table_insert, song_data)
    
    # insert artist record
    artist_data = df[['artist_id', 'artist_name', 'artist_location', 'artist_latitude', 'artist_longitude']].values[0].tolist()
    cur.execute(artist_table_insert, artist_data)


def process_log_file(cur, filepath):
    """ TODO: Write a docstring documentation here.
    """
    # open log file
    df = pd.read_json(filepath, lines=True)

    # filter by NextSong action
    df = df[df['page'] == 'NextSong']

    # convert timestamp column to datetime
    t = pd.to_datetime(df['ts'], unit='ms')
    
    # insert time data records
    time_data = (df['ts'], t.dt.hour, t.dt.day, t.dt.week, t.dt.month, t.dt.year, t.dt.weekday)
    column_labels = ('start_time', 'hour', 'day', 'week', 'month', 'year', 'weekday')
    time_df = pd.DataFrame(data=dict(zip(column_labels, time_data)))

    for i, row in time_df.iterrows():
        cur.execute(time_table_insert, list(row))

    # load user table
    user_df = df[['userId', 'firstName', 'lastName', 'gender', 'level']]

    # insert user records
    for i, row in user_df.iterrows():
        cur.execute(user_table_insert, row)

    # insert songplay records
    for index, row in df.iterrows():
        
        # get songid and artistid from song and artist tables
        cur.execute(song_select, (row.song, row.artist, row.length))
        results = cur.fetchone()
        
        if results:
            songid, artistid = results
        else:
            songid, artistid = None, None

        # insert songplay record
        songplay_data = (row['ts'],
                         row['userId'],
                         row['level'],
                         songid, artistid,
                         row['sessionId'],
                         row['location'],
                         row['userAgent'])
        cur.execute(songplay_table_insert, songplay_data)

In [30]:
from tqdm import tqdm

def process_data(cur, conn, filepath, func):
    """
    Processes all JSON files in the given directory using the specified function.

    This function walks through all subdirectories of the given filepath, finds all JSON files,
    and applies the provided processing function to each file. It commits changes to the database
    after processing each file.

    Parameters:
    cur (psycopg2.cursor): Cursor for the database connection.
    conn (psycopg2.connection): Active connection to the database.
    filepath (str): Path to the root directory containing JSON files.
    func (function): Function to apply to each file, typically for data extraction and insertion.
    """
    # get all files matching extension from directory
    all_files = []
    for root, dirs, files in os.walk(filepath):
        files = glob.glob(os.path.join(root,'*.json'))
        for f in files :
            all_files.append(os.path.abspath(f))

    # get total number of files found
    num_files = len(all_files)
    print('{} files found in {}'.format(num_files, filepath))

    # iterate over files and process
    for datafile in tqdm(all_files, desc=f'Processing {filepath}', unit='file'):
        func(cur, datafile)
        conn.commit()

def truncate_tables(cur, conn):
    """
    Removes all data from key tables in the database to reset the ETL process.

    This function truncates the following tables: songplays, songs, artists, users, and time.
    It uses the CASCADE option to ensure that dependent data (e.g., foreign key constraints)
    is also removed safely.

    Parameters:
    cur (psycopg2.cursor): Cursor for the database connection.
    conn (psycopg2.connection): Active connection to the database.
    """
    tables = ['songplays', 'songs', 'artists', 'users', 'time']
    
    for table in tables:
        cur.execute(f'TRUNCATE TABLE {table} CASCADE;')
        conn.commit()
    
    print("✅ All tables truncated successfully.")
    

def main():
    """
    Entry point of the ETL pipeline.

    This function establishes a database connection and executes the full ETL process:
    - It processes all song data files using `process_song_file`.
    - It processes all log data files using `process_log_file`.
    - It closes the database connection when finished.
    """
    conn = psycopg2.connect("host=127.0.0.1 dbname=sparkifydb user=student password=student")
    cur = conn.cursor()

    truncate_tables(cur, conn)
    process_data(cur, conn, filepath='data/song_data', func=process_song_file)
    process_data(cur, conn, filepath='data/log_data', func=process_log_file)

    conn.close()
    
main()

Processing data/song_data:  21%|██        | 15/71 [00:00<00:00, 144.96file/s]

✅ All tables truncated successfully.
71 files found in data/song_data


Processing data/song_data: 100%|██████████| 71/71 [00:00<00:00, 139.23file/s]
Processing data/log_data:   0%|          | 0/30 [00:00<?, ?file/s]

30 files found in data/log_data


Processing data/log_data: 100%|██████████| 30/30 [00:06<00:00,  4.06file/s]


# Final Test

Find a single songplay row that has non-empty `song_id` and `artist_id`.

When correct, there should be **exactly one row** that meets this requirement.

In [31]:
conn = psycopg2.connect("host=127.0.0.1 dbname=sparkifydb user=student password=student")
cur = conn.cursor()

cur.execute("select * from songplays WHERE song_id is not null and artist_id is not null")
results = cur.fetchall()
print("Result of `select * from songplays WHERE song_id is not null and artist_id is not null`:")
print(results)
assert(len(results) == 1)

conn.close()

Result of `select * from songplays WHERE song_id is not null and artist_id is not null`:
[(20834, 1542837407796, 15, 'paid', 'SOZCTXZ12AB0182364', 'AR5KOSW1187FB35FF4', 818, 'Chicago-Naperville-Elgin, IL-IN-WI', '"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/36.0.1985.125 Chrome/36.0.1985.125 Safari/537.36"')]
