<center>
<img src="https://laelgelcpublic.s3.sa-east-1.amazonaws.com/lael_50_years_narrow_white.png.no_years.400px_96dpi.png" width="300" alt="LAEL 50 years logo">
<h3>APPLIED LINGUISTICS GRADUATE PROGRAMME (LAEL)</h3>
</center>
<hr>

# Programme to uncompress the archives

## Prerequisites

### Environment variables

AWS credentials and other parameters should be stored in the `.env` file.

In [None]:
AWS_ACCESS_KEY_ID=<YOUR_ACCESS_KEY_ID>
AWS_SECRET_ACCESS_KEY=<YOUR_SECRET_ACCESS_KEY>
REGION_NAME=<YOUR_REGION_NAME>
SOURCE_BUCKET_NAME=<YOUR_SOURCE_BUCKET_NAME>
DESTINATION_BUCKET_NAME=<YOUR_DESTINATION_BUCKET_NAME>
INPUT_DIRECTORY=<OUTPUT_DIRECTORY_input>
OUTPUT_DIRECTORY=<OUTPUT_DIRECTORY>

### Required libraries

The required libraries are going to be stored in the `unarchive.req` file.
- Create the environment `my_env` with the command: `python3 -m venv my_env`
- Activate the `my_env`: `cd my_env && source bin/activate`
- The following command should be executed: `pip install -r unarchive.req`

#### Contents of `unarchive.req`

In [None]:
python-dotenv
boto3
pandas

### Execution in the background

In [None]:
nohup python -u unarchive.py &

## Code of `unarchive.py`

In [None]:
# Edit the file '.env' and provide the required parameters
# Install the required libraries in the environment by executing: 'pip install -r unarchive.req'

# Importing the required libraries
from dotenv import load_dotenv
import boto3
import pandas as pd
import tarfile
import bz2
import os
import sys
import shutil
import datetime

load_dotenv()  # This line brings all environment variables from '.env' into 'os.environ'

# Define the name of the CSV file containing the list of S3 keys
key_list = 'unarchive_key_list_test.csv'
#key_list = 'unarchive_key_list_2011.csv'
#key_list = 'unarchive_key_list_2012.csv'
#key_list = 'unarchive_key_list_2013.csv'
#key_list = 'unarchive_key_list_2014.csv'
#key_list = 'unarchive_key_list_2015.csv'
#key_list = 'unarchive_key_list_2016.csv'
#key_list = 'unarchive_key_list_2017.csv'
#key_list = 'unarchive_key_list_2018.csv'
#key_list = 'unarchive_key_list_2019.csv'
#key_list = 'unarchive_key_list_2020.csv'
#key_list = 'unarchive_key_list_2021.csv'
#key_list = 'unarchive_key_list_2022.csv'
#key_list = 'unarchive_key_list_2023.csv'

# Set up AWS credentials
aws_access_key_id = os.environ['AWS_ACCESS_KEY_ID']
aws_secret_access_key = os.environ['AWS_SECRET_ACCESS_KEY']
region_name = os.environ['REGION_NAME']

# Set up S3 client
s3 = boto3.client('s3', aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key, region_name=region_name)

# Set up the source and destination S3 bucket names
source_bucket_name = os.environ['SOURCE_BUCKET_NAME']
destination_bucket_name = os.environ['DESTINATION_BUCKET_NAME']

# Define the name of the directory where the downloaded files will be stored
input_directory = os.environ['INPUT_DIRECTORY']

# Check if the input directory already exists. If it does, remove it and its contents. If it doesn't exist, create it.
if os.path.exists(input_directory):
    shutil.rmtree(input_directory)
    print('Old output directory successfully removed.')
    try:
        os.makedirs(input_directory)
        print('Output directory successfully created.')
    except OSError as e:
        print('Failed to create the directory:', e)
        sys.exit(1)
else:
    try:
        os.makedirs(input_directory)
        print('Output directory successfully created.')
    except OSError as e:
        print('Failed to create the directory:', e)
        sys.exit(1)

# Define the name of the directory where the unarchived files will be stored
output_directory = os.environ['OUTPUT_DIRECTORY']

# Check if the output directory already exists. If it does, remove it and its contents. If it doesn't exist, create it.
if os.path.exists(output_directory):
    shutil.rmtree(output_directory)
    print('Old output directory successfully removed.')
    try:
        os.makedirs(output_directory)
        print('Output directory successfully created.')
    except OSError as e:
        print('Failed to create the directory:', e)
        sys.exit(1)
else:
    try:
        os.makedirs(output_directory)
        print('Output directory successfully created.')
    except OSError as e:
        print('Failed to create the directory:', e)
        sys.exit(1)

# Read the key CSV file into a pandas DataFrame
df = pd.read_csv(key_list, header=0)

# Iterate over each row in the DataFrame
for index, row in df.iterrows():
    tar_file_key = row['filename-destination']
    timestamp = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    print(timestamp, ': Downloading ' + tar_file_key)
    s3.download_file(source_bucket_name, tar_file_key, input_directory + '/' + tar_file_key)
    timestamp = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    print(timestamp, ': Extracting ' + tar_file_key)
    with tarfile.open(input_directory + '/' + tar_file_key, 'r') as tar:
        tar.extractall(path=output_directory)
    # Iterate over the extracted files
    timestamp = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    print(timestamp, ': Extracting and transferring .bz2 files to S3 for ' + tar_file_key)
    for root, dirs, files in os.walk(output_directory):
        for file in files:
            if file.endswith('.bz2'):
                # Uncompress each .bz2 file
                with bz2.open(os.path.join(root, file), 'rb') as bz_file:
                    uncompressed_data = bz_file.read()
                    
                    # Remove the .bz2 extension from the filename
                    uncompressed_filename = file[:-4]
                    
                    # Get the relative path of the file within the directory tree
                    relative_path = os.path.relpath(os.path.join(root, file), '.')
                    
                    # Upload the processed file to the destination S3 bucket with the same directory tree structure
                    destination_key = os.path.join(relative_path, uncompressed_filename)
                    s3.put_object(Body=uncompressed_data, Bucket=destination_bucket_name, Key=destination_key)
                    timestamp = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
                    print(timestamp, ':', uncompressed_filename, 'transferred')
    shutil.rmtree(input_directory)
    os.makedirs(input_directory)
    shutil.rmtree(output_directory)
    os.makedirs(output_directory)
    timestamp = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    print(timestamp, ': Input and output directories cleared out for ' + tar_file_key)


## Code to parse the tweets files ([JSONL](https://spark.apache.org/docs/latest/sql-data-sources-json.html) format) to a Spark DataFrame

### Prototype using a local context instead of a real Spark cluster

In [2]:
from pyspark.sql import SparkSession
from pyspark import SparkContext

# Create a local Spark context
sc = SparkContext('local', 'The Twitter Grab 2019 Corpus')

# Create a Spark session
spark = SparkSession.builder.appName('The Twitter Grab 2019 Corpus').getOrCreate()

# Set the folder path
data_source = 'gelctweets'

# Read the JSONL files into a DataFrame
tweets_df = spark.read.json(data_source)


23/12/18 20:11:31 WARN Utils: Your hostname, blue-nbjupyterhub19 resolves to a loopback address: 127.0.0.1; using 10.0.0.26 instead (on interface ens5)
23/12/18 20:11:31 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/12/18 20:11:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/12/18 20:11:41 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.


In [4]:
# Show the first few rows of the DataFrame
tweets_df.show()

[Stage 1:>                                                          (0 + 1) / 1]

+------------+-----------+--------------------+------+------------------+----------------------+--------------------+--------------+--------------+---------+------------+----+-------------------+-------------------+-----------------------+---------------------+-------------------------+-------------------+-----------------------+---------------+----+-----+------------------+-----------+-------------+----------------+--------------------+-----------------------+-----------+-------------+---------+--------------------+--------------------+--------------------------+-------------+---------+--------------------+---------------------+
|contributors|coordinates|          created_at|delete|display_text_range|              entities|   extended_entities|extended_tweet|favorite_count|favorited|filter_level| geo|                 id|             id_str|in_reply_to_screen_name|in_reply_to_status_id|in_reply_to_status_id_str|in_reply_to_user_id|in_reply_to_user_id_str|is_quote_status|lang|place|poss

                                                                                

In [6]:
# Show the quantity of columns of the DataFrame
len(tweets_df.columns)

38

In [8]:
# Show the quantity of rows (tweets) of the DataFrame
tweets_df.count()

2307

In [10]:
# Show the schema of the DataFrame
tweets_df.printSchema()

root
 |-- contributors: string (nullable = true)
 |-- coordinates: struct (nullable = true)
 |    |-- coordinates: array (nullable = true)
 |    |    |-- element: double (containsNull = true)
 |    |-- type: string (nullable = true)
 |-- created_at: string (nullable = true)
 |-- delete: struct (nullable = true)
 |    |-- status: struct (nullable = true)
 |    |    |-- id: long (nullable = true)
 |    |    |-- id_str: string (nullable = true)
 |    |    |-- user_id: long (nullable = true)
 |    |    |-- user_id_str: string (nullable = true)
 |    |-- timestamp_ms: string (nullable = true)
 |-- display_text_range: array (nullable = true)
 |    |-- element: long (containsNull = true)
 |-- entities: struct (nullable = true)
 |    |-- hashtags: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- indices: array (nullable = true)
 |    |    |    |    |-- element: long (containsNull = true)
 |    |    |    |-- text: string (nullable = true)
 |    |-

In [12]:
from pyspark.sql.functions import array_contains

# Define the list of hashtags for DataFrame filtering
hashtags = ['1FIRST', 'RaveNow']

# Create a filtered DataFrame
filtered_tweets_df = tweets_df.filter(\
    array_contains('entities.hashtags.text', hashtags[0]) |\
    array_contains('entities.hashtags.text', hashtags[1])\
                                     )


In [14]:
# Show the first few rows of the DataFrame
filtered_tweets_df.show()

[Stage 5:>                                                          (0 + 1) / 1]

+------------+-----------+--------------------+------+------------------+--------------------+-----------------+--------------+--------------+---------+------------+----+-------------------+-------------------+-----------------------+---------------------+-------------------------+-------------------+-----------------------+---------------+----+-----+------------------+-----------+-------------+----------------+--------------------+-----------------------+-----------+-------------+---------+--------------------+--------------------+--------------------+-------------+---------+--------------------+---------------------+
|contributors|coordinates|          created_at|delete|display_text_range|            entities|extended_entities|extended_tweet|favorite_count|favorited|filter_level| geo|                 id|             id_str|in_reply_to_screen_name|in_reply_to_status_id|in_reply_to_status_id_str|in_reply_to_user_id|in_reply_to_user_id_str|is_quote_status|lang|place|possibly_sensitive|q

                                                                                

In [16]:
# Show the quantity of columns of the DataFrame
len(filtered_tweets_df.columns)

38

In [18]:
# Show the quantity of rows (tweets) of the DataFrame
filtered_tweets_df.count()

2

In [20]:
# Show the schema of the DataFrame
filtered_tweets_df.printSchema()

root
 |-- contributors: string (nullable = true)
 |-- coordinates: struct (nullable = true)
 |    |-- coordinates: array (nullable = true)
 |    |    |-- element: double (containsNull = true)
 |    |-- type: string (nullable = true)
 |-- created_at: string (nullable = true)
 |-- delete: struct (nullable = true)
 |    |-- status: struct (nullable = true)
 |    |    |-- id: long (nullable = true)
 |    |    |-- id_str: string (nullable = true)
 |    |    |-- user_id: long (nullable = true)
 |    |    |-- user_id_str: string (nullable = true)
 |    |-- timestamp_ms: string (nullable = true)
 |-- display_text_range: array (nullable = true)
 |    |-- element: long (containsNull = true)
 |-- entities: struct (nullable = true)
 |    |-- hashtags: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- indices: array (nullable = true)
 |    |    |    |    |-- element: long (containsNull = true)
 |    |    |    |-- text: string (nullable = true)
 |    |-

In [22]:
# Export the DataFrame to JSONL format
filtered_tweets_df.write.mode('overwrite').json('filtered_tweets.jsonl')

                                                                                

### Code on a real Spark cluster over Amazon EMR

In [1]:
from pyspark.sql import SparkSession

# Create a SparkSession
spark = SparkSession.builder.appName("The Twitter Grab 2019 Corpus").getOrCreate()

# Set the S3 bucket and folder paths
source_bucket = 'gelctweets'
year = '2019'
month = '01'
#data_source = 's3://' + source_bucket + '/' + year + '_' + month + '/01/00/**/*.json'
data_source = 's3://' + source_bucket + '/' + year + '_' + month + '/01/00/29.json.bz2/*.json'

# Read the JSONL files into a DataFrame
tweets_spark_df = spark.read.json(data_source)


VBox()

Starting Spark application


ID,YARN Application ID,Kind,State,Spark UI,Driver log,User,Current session?
0,application_1702944008462_0001,pyspark,idle,Link,Link,,✔


FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

SparkSession available as 'spark'.


FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

In [2]:
# Show the first few rows of the DataFrame
tweets_spark_df.show()

VBox()

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

+------------+-----------+--------------------+------+------------------+----------------------+--------------------+--------------+--------------+---------+------------+----+-------------------+-------------------+-----------------------+---------------------+-------------------------+-------------------+-----------------------+---------------+----+-----+------------------+-----------+-------------+----------------+--------------------+-----------------------+-----------+-------------+---------+--------------------+--------------------+--------------------------+-------------+---------+--------------------+---------------------+
|contributors|coordinates|          created_at|delete|display_text_range|              entities|   extended_entities|extended_tweet|favorite_count|favorited|filter_level| geo|                 id|             id_str|in_reply_to_screen_name|in_reply_to_status_id|in_reply_to_status_id_str|in_reply_to_user_id|in_reply_to_user_id_str|is_quote_status|lang|place|poss

In [3]:
# Show the quantity of columns of the DataFrame
len(tweets_spark_df.columns)

VBox()

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

38

In [4]:
# Show the quantity of rows (tweets) of the DataFrame
tweets_spark_df.count()

VBox()

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

2307

In [5]:
# Show the schema of the DataFrame
tweets_spark_df.printSchema()

VBox()

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

root
 |-- contributors: string (nullable = true)
 |-- coordinates: struct (nullable = true)
 |    |-- coordinates: array (nullable = true)
 |    |    |-- element: double (containsNull = true)
 |    |-- type: string (nullable = true)
 |-- created_at: string (nullable = true)
 |-- delete: struct (nullable = true)
 |    |-- status: struct (nullable = true)
 |    |    |-- id: long (nullable = true)
 |    |    |-- id_str: string (nullable = true)
 |    |    |-- user_id: long (nullable = true)
 |    |    |-- user_id_str: string (nullable = true)
 |    |-- timestamp_ms: string (nullable = true)
 |-- display_text_range: array (nullable = true)
 |    |-- element: long (containsNull = true)
 |-- entities: struct (nullable = true)
 |    |-- hashtags: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- indices: array (nullable = true)
 |    |    |    |    |-- element: long (containsNull = true)
 |    |    |    |-- text: string (nullable = true)
 |    |-

In [6]:
from pyspark.sql.functions import array_contains

# Define the list of hashtags for DataFrame filtering
hashtags = ['1FIRST', 'RaveNow']

# Create a filtered DataFrame
filtered_tweets_spark_df = tweets_spark_df.filter(\
    array_contains('entities.hashtags.text', hashtags[0]) |\
    array_contains('entities.hashtags.text', hashtags[1])\
                                     )


VBox()

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

In [7]:
# Show the first few rows of the DataFrame
filtered_tweets_spark_df.show()

VBox()

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

+------------+-----------+--------------------+------+------------------+--------------------+-----------------+--------------+--------------+---------+------------+----+-------------------+-------------------+-----------------------+---------------------+-------------------------+-------------------+-----------------------+---------------+----+-----+------------------+-----------+-------------+----------------+--------------------+-----------------------+-----------+-------------+---------+--------------------+--------------------+--------------------+-------------+---------+--------------------+---------------------+
|contributors|coordinates|          created_at|delete|display_text_range|            entities|extended_entities|extended_tweet|favorite_count|favorited|filter_level| geo|                 id|             id_str|in_reply_to_screen_name|in_reply_to_status_id|in_reply_to_status_id_str|in_reply_to_user_id|in_reply_to_user_id_str|is_quote_status|lang|place|possibly_sensitive|q

In [8]:
# Show the quantity of rows (tweets) of the DataFrame
filtered_tweets_spark_df.count()

VBox()

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

2

In [9]:
# Show the schema of the DataFrame
filtered_tweets_spark_df.printSchema()

VBox()

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

root
 |-- contributors: string (nullable = true)
 |-- coordinates: struct (nullable = true)
 |    |-- coordinates: array (nullable = true)
 |    |    |-- element: double (containsNull = true)
 |    |-- type: string (nullable = true)
 |-- created_at: string (nullable = true)
 |-- delete: struct (nullable = true)
 |    |-- status: struct (nullable = true)
 |    |    |-- id: long (nullable = true)
 |    |    |-- id_str: string (nullable = true)
 |    |    |-- user_id: long (nullable = true)
 |    |    |-- user_id_str: string (nullable = true)
 |    |-- timestamp_ms: string (nullable = true)
 |-- display_text_range: array (nullable = true)
 |    |-- element: long (containsNull = true)
 |-- entities: struct (nullable = true)
 |    |-- hashtags: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- indices: array (nullable = true)
 |    |    |    |    |-- element: long (containsNull = true)
 |    |    |    |-- text: string (nullable = true)
 |    |-

In [10]:
# Export the DataFrame to JSONL format
output_path = 's3://gelcawsemr/2019_01_01_00/filtered_tweets.jsonl'
filtered_tweets_spark_df.write.mode('overwrite').json(output_path)

VBox()

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

# snscrape format

In [None]:
import pandas as pd

# Read the JSON file into a DataFrame
df1 = pd.read_json('snscrape.json', lines=True)

In [None]:
df1.info()

In [None]:
import json

quoted_tweet = df1.loc[7, 'quotedTweet']
quoted_tweet_str = json.dumps(quoted_tweet)
quoted_tweet_dict = json.loads(quoted_tweet_str)

media = quoted_tweet_dict['media']
print(media)

In [None]:
df1

# Internet Archive format

In [88]:
import pandas as pd

# Read the JSON file into a DataFrame
df2 = pd.read_json('intarch.json', lines=True)

  df2 = pd.read_json('intarch.json', lines=True)
  df2 = pd.read_json('intarch.json', lines=True)


In [90]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 37 columns):
 #   Column                     Non-Null Count  Dtype              
---  ------                     --------------  -----              
 0   created_at                 176 non-null    datetime64[ns, UTC]
 1   id                         176 non-null    float64            
 2   id_str                     176 non-null    float64            
 3   text                       176 non-null    object             
 4   display_text_range         45 non-null     object             
 5   source                     176 non-null    object             
 6   truncated                  176 non-null    float64            
 7   in_reply_to_status_id      38 non-null     float64            
 8   in_reply_to_status_id_str  38 non-null     float64            
 9   in_reply_to_user_id        39 non-null     float64            
 10  in_reply_to_user_id_str    39 non-null     float64            
 11  in_rep

In [92]:
df2

Unnamed: 0,created_at,id,id_str,text,display_text_range,source,truncated,in_reply_to_status_id,in_reply_to_status_id_str,in_reply_to_user_id,...,timestamp_ms,retweeted_status,extended_entities,possibly_sensitive,quoted_status_id,quoted_status_id_str,quoted_status,quoted_status_permalink,extended_tweet,delete
0,2019-01-01 07:29:00+00:00,1.080003e+18,1.080003e+18,@supr_dorapo 最新のゼルダやってないけどゼルダも好きですﾋﾟｭｰﾋﾟｭｰﾋﾟｭﾋ...,"[13, 58]","<a href=""http://twitter.com/download/android"" ...",0.0,1.080003e+18,1.080003e+18,2.787406e+09,...,2019-01-01 07:28:07.168,,,,,,,,,
1,2019-01-01 07:29:00+00:00,1.080003e+18,1.080003e+18,そうだよ、僕がピーマンだよ,,"<a href=""http://twittbot.net/"" rel=""nofollow"">...",0.0,,,,...,2019-01-01 07:28:07.168,,,,,,,,,
2,2019-01-01 07:29:00+00:00,1.080003e+18,1.080003e+18,@subaru_yamamoto 勝ち退くが利って、それ得意なひとTwitterにたくさんい...,"[17, 66]","<a href=""http://twitter.com/download/iphone"" r...",0.0,1.080002e+18,1.080002e+18,3.955496e+09,...,2019-01-01 07:28:07.168,,,,,,,,,
3,2019-01-01 07:29:00+00:00,1.080003e+18,1.080003e+18,RT @PPGcnjp: 初もうで😊 https://t.co/NOfd4q1ro5,,"<a href=""http://twitter.com/download/iphone"" r...",0.0,,,,...,2019-01-01 07:28:07.168,{'created_at': 'Tue Jan 01 07:24:00 +0000 2019...,"{'media': [{'id': 1076011587624873984, 'id_str...",1.0,,,,,,
4,2019-01-01 07:29:00+00:00,1.080003e+18,1.080003e+18,RT @DJ_GINTA_NEW: 【お正月の歌作ったよん🎍】 https://t.co/e...,,"<a href=""http://twitter.com/download/iphone"" r...",0.0,,,,...,2019-01-01 07:28:07.168,{'created_at': 'Tue Jan 01 07:22:06 +0000 2019...,"{'media': [{'id': 1080001090727247872, 'id_str...",0.0,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,2019-01-01 07:29:04+00:00,1.080003e+18,1.080003e+18,@Fams000 @Rouguii77 +mention inch'allah bonne ...,"[20, 59]","<a href=""http://twitter.com/download/android"" ...",0.0,1.079895e+18,1.079895e+18,2.878622e+09,...,2019-01-01 07:28:07.168,,,,,,,,,
196,2019-01-01 07:29:04+00:00,1.080003e+18,1.080003e+18,RT @Sphenodontiaart: hope to god they kiss for...,,"<a href=""http://twitter.com/download/iphone"" r...",0.0,,,,...,2019-01-01 07:28:07.168,{'created_at': 'Mon Dec 31 06:09:07 +0000 2018...,"{'media': [{'id': 1079620371505065984, 'id_str...",0.0,,,,,,
197,2019-01-01 07:29:04+00:00,1.080003e+18,1.080003e+18,RT @fuzakefactory: 明けましておめでとうございます🐗\n新年お年玉企画🎉\...,,"<a href=""http://twitter.com/download/iphone"" r...",0.0,,,,...,2019-01-01 07:28:07.168,{'created_at': 'Tue Jan 01 06:45:32 +0000 2019...,,,,,,,,
198,2019-01-01 07:29:04+00:00,1.080003e+18,1.080003e+18,Feliz año nuevo! Happy new year! #2019 @ Nuevo...,,"<a href=""http://instagram.com"" rel=""nofollow"">...",0.0,,,,...,2019-01-01 07:28:07.168,,,0.0,,,,,,
