# The JSON Import notebook

The original data provided by the project's previous student were 10 files containing the tweets from the periods 2012-2013 to 2021-2022. To make the data analysis easier, this notebook reads all 10 of those .txt files with JSON content and adds it to one big complete dataframe, and then saves it as a feather file to further use in the other notebooks.

#### Disclaimer 1: To make sure no personal data is shown, screen names or tweet content are never displayed in this notebook
#### Disclaimer 2: This notebook has no output, since this was run on the RUG LWP computer at the beginning to extract the data, but is now on the Habrok server where the original .txt files are not present so it cannot be run again. This notebook is just for showing how the data has been extracted and put into a dataframe

Libraries needed:

In [None]:
# !pip install pyarrow pandas
# Uncomment and install the libraries if not already

In [None]:
import pandas as pd # Used for data manipulation and analysis
import json # Used to import the JSON files in the json .txt files
import glob # Used ot look up files

### Reading the files and creating a dataframe to save

In [None]:
# Create an empty list that takes the data
data_list = []

# Specify where the .txt files are stored
file_list = glob.glob('/home/s4029763/TweetData/TwitterGEDv2/*.txt')


for file in file_list:
    with open(file, 'r', encoding='utf-8') as f:
        # Read all lines from the file
        for line in f:
            try:
                # Load each lines as json
                data = json.loads(line.strip())
                # Add the lines to the data_list
                data_list.append(data)
            # give an error if it doesnt work
            except json.JSONDecodeError:
                print(f"Error decoding JSON in file {file} on line: {line}")

# Turn the json list into a normalized dataframe
groningen_complete = pd.json_normalize(data_list)

In [None]:
groningen_complete.info()

In [21]:
# Save it as a feather file
groningen_complete.to_feather("/home/s4029763/Final Folder/gaswinning_tweets_compleet.feather")