## Summary

Transform the CSV file of Tidy Tuesday tweets into a DuckDB.

## Dependencies

In [6]:
import shutil
import tempfile
import urllib.request
import pandas as pd
import duckdb

## Global constants

In [4]:
TWEETS_URL = ('https://raw.githubusercontent.com/rfordatascience/tidytuesday/'
              + 'master/tidytuesday_tweets/data.csv')

## Main

### Import the CSV file

In [20]:
with urllib.request.urlopen(TWEETS_URL) as response:
    with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
        shutil.copyfileobj(response, tmp_file)

with open(tmp_file.name, encoding='utf8') as csv_f:
    tweets = pd.read_csv(csv_f, low_memory=False)

tweets.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17284 entries, 0 to 17283
Data columns (total 92 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   week                     17284 non-null  int64  
 1   user_id                  17284 non-null  int64  
 2   status_id                17284 non-null  int64  
 3   created_at               17284 non-null  object 
 4   screen_name              17284 non-null  object 
 5   text                     17284 non-null  object 
 6   source                   17283 non-null  object 
 7   display_text_width       17284 non-null  int64  
 8   reply_to_status_id       2244 non-null   float64
 9   reply_to_user_id         2385 non-null   float64
 10  reply_to_screen_name     2385 non-null   object 
 11  is_quote                 17284 non-null  bool   
 12  is_retweet               17284 non-null  bool   
 13  favorite_count           17284 non-null  int64  
 14  retweet_count         

In [19]:
tweets.head()

Unnamed: 0,week,user_id,status_id,created_at,screen_name,text,source,display_text_width,reply_to_status_id,reply_to_user_id,...,favourites_count,account_created_at,verified,profile_url,profile_expanded_url,account_lang,profile_banner_url,profile_background_url,profile_image_url,ext_alt_text
0,14,1241814552,980921600429252608,2018-04-02 21:35:08,thomas_mock,Happy to announce the newest #R4DS online lear...,Twitter Web Client,270,,,...,29237,2013-03-04 17:35:03,False,https://t.co/uCjZToIacP,https://themockup.blog,,https://pbs.twimg.com/profile_banners/12418145...,http://abs.twimg.com/images/themes/theme1/bg.png,http://pbs.twimg.com/profile_images/9515783812...,
1,14,1241814552,980921607492382722,2018-04-02 21:35:10,thomas_mock,MOST IMPORTANT RULE!\n\n0. Have fun! Connect w...,Twitter Web Client,198,9.809216e+17,1241815000.0,...,29237,2013-03-04 17:35:03,False,https://t.co/uCjZToIacP,https://themockup.blog,,https://pbs.twimg.com/profile_banners/12418145...,http://abs.twimg.com/images/themes/theme1/bg.png,http://pbs.twimg.com/profile_images/9515783812...,
2,14,1241814552,980921611980300288,2018-04-02 21:35:11,thomas_mock,4. Use the hashtag #TidyTuesday on Twitter if ...,Twitter Web Client,101,9.809216e+17,1241815000.0,...,29237,2013-03-04 17:35:03,False,https://t.co/uCjZToIacP,https://themockup.blog,,https://pbs.twimg.com/profile_banners/12418145...,http://abs.twimg.com/images/themes/theme1/bg.png,http://pbs.twimg.com/profile_images/9515783812...,
3,14,1241814552,980950802759069697,2018-04-02 23:31:11,thomas_mock,@umairdurrani87 That’s awesome! Make sure to t...,Twitter for iPhone,72,9.809419e+17,7.787731e+17,...,29237,2013-03-04 17:35:03,False,https://t.co/uCjZToIacP,https://themockup.blog,,https://pbs.twimg.com/profile_banners/12418145...,http://abs.twimg.com/images/themes/theme1/bg.png,http://pbs.twimg.com/profile_images/9515783812...,
4,14,778773117036617732,980964560512405505,2018-04-03 00:25:51,umairdurrani87,#TidyTuesday week one. Plot of average tuition...,Twitter Web Client,92,,,...,3463,2016-09-22 01:49:14,False,https://t.co/wBRQELao7i,http://durraniu.github.io,,,http://abs.twimg.com/images/themes/theme1/bg.png,http://pbs.twimg.com/profile_images/1284666844...,
