Loading the WORM bucket.  Will start with S3fs.  Note that I have already configured my AWS credentials in the AWS CLI.

In [22]:
import s3fs
fs = s3fs.S3FileSystem(anon=False,key='###########',secret='##############')

In [3]:
!aws s3 ls # Use the AWS CLI to list buckets available.  worm-begin has Object Lock enabled.

2021-04-06 21:03:54 music-demo-lyrics
2021-04-06 20:10:43 music-lyrics
2021-04-07 07:23:53 worm-begin


I am now learning that writing from local to S3 using s3fs is not what s3fs was made for.  It is more for reading files.  I'l now try boto3.

In [23]:
import os
import boto3

from dotenv import load_dotenv
load_dotenv(verbose=True)

def aws_session(region_name='us-east-1'):
    return boto3.session.Session(aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'), #looks for any .env file
                                aws_secret_access_key=os.getenv('AWS_ACCESS_KEY_SECRET'), #Has to be in same directory
                                region_name=region_name) #from above

def make_bucket(name, acl): 
    session = aws_session()
    s3_resource = session.resource('s3')
    return s3_resource.create_bucket(Bucket=name, ACL=acl)

def upload_file_to_bucket(bucket_name, file_path):
    session = aws_session()
    s3_resource = session.resource('s3')
    file_dir, file_name = os.path.split(file_path)

    bucket = s3_resource.Bucket(bucket_name)
    bucket.upload_file(
      Filename=file_path,
      Key=file_name,
      ExtraArgs={'ACL': 'public-read'}
    )

    s3_url = f"https://{bucket_name}.s3.amazonaws.com/{file_name}"
    return s3_url

## s3_url = upload_file_to_bucket('worm-begin','lyrics_25k.csv')
## print(s3_url) 
## s3_url = upload_file_to_bucket('worm-begin','album_details_25k.csv')
## print(s3_url)
## s3_url = upload_file_to_bucket('worm-begin','songs_details_25k.csv')
## print(s3_url)

def download_file_from_bucket(bucket_name, s3_key, dst_path):
    session = aws_session()
    s3_resource = session.resource('s3')
    bucket = s3_resource.Bucket(bucket_name)
    bucket.download_file(Key=s3_key, Filename=dst_path)

## download_file_from_bucket('music-demo-lyrics', 'lyrics_25k.csv', 'short_name.csv')
## with open('short_name.csv') as fo:
    ## print(fo.read())

https://worm-begin.s3.amazonaws.com/lyrics_25k.csv
https://worm-begin.s3.amazonaws.com/album_details_25k.csv
https://worm-begin.s3.amazonaws.com/songs_details_25k.csv


In [24]:
# Use s3fs for checking the bucket.
fs.ls('worm-begin')

['worm-begin/album_details_25k.csv',
 'worm-begin/lyrics_25k.csv',
 'worm-begin/songs_details_25k.csv']

In [25]:
# use AWS CLI for checking the bucket.
!aws s3 ls 'worm-begin'

2021-04-07 08:38:33     105621 album_details_25k.csv
2021-04-07 08:09:22   27655251 decades_tcc_ceds_music.csv
2021-04-07 08:34:22     348120 genres_artists_data.csv
2021-04-07 08:19:38  276163416 genres_lyrics_data.csv
2021-04-07 08:22:31  187407184 labeled_lyrics_cleaned.csv
2021-04-07 08:38:18   40754845 lyrics_25k.csv
2021-04-07 08:38:34    2276377 songs_details_25k.csv


In [9]:
#load one more using the Boto functions. Check it the same ways.
s3_url = upload_file_to_bucket('worm-begin','decades_tcc_ceds_music.csv')
print(s3_url)
fs.ls('worm-begin')
!aws s3 ls 'worm-begin'

https://worm-begin.s3.amazonaws.com/decades_tcc_ceds_music.csv
2021-04-07 08:02:29     105621 album_details_25k.csv
2021-04-07 08:09:22   27655251 decades_tcc_ceds_music.csv
2021-04-07 08:02:16   40754845 lyrics_25k.csv
2021-04-07 08:02:30    2276377 songs_details_25k.csv


In [11]:
#check the read functionality in s3fs.
import pandas as pd
test = pd.read_csv('s3://worm-begin/lyrics_25k.csv')
test.describe(include='all')

Unnamed: 0.1,Unnamed: 0,link,artist,song_name,lyrics
count,25742.0,25742,25742,25742,25742
unique,,25018,542,21073,24883
top,,../lyrics/elvispresley/canthelpfallinginlove.html,Johnny Cash Lyrics,Intro,\n\n[Instrumental]\n
freq,,7,813,22,9
mean,12870.5,,,,
std,7431.219651,,,,
min,0.0,,,,
25%,6435.25,,,,
50%,12870.5,,,,
75%,19305.75,,,,


In [12]:
# use AWS CLI to load several files.
!aws s3 cp genres_lyrics_data.csv s3://worm-begin

upload: ./genres_lyrics_data.csv to s3://worm-begin/genres_lyrics_data.csv


That was very slow.  But, it is slow on the terminal as well.

In [14]:
fs.ls('worm-begin')

['worm-begin/album_details_25k.csv',
 'worm-begin/lyrics_25k.csv',
 'worm-begin/songs_details_25k.csv']

In [15]:
!aws s3 ls 'worm-begin'

2021-04-07 08:02:29     105621 album_details_25k.csv
2021-04-07 08:09:22   27655251 decades_tcc_ceds_music.csv
2021-04-07 08:19:38  276163416 genres_lyrics_data.csv
2021-04-07 08:22:31  187407184 labeled_lyrics_cleaned.csv
2021-04-07 08:02:16   40754845 lyrics_25k.csv
2021-04-07 08:02:30    2276377 songs_details_25k.csv


Why is the list from s3fs different from the list from AWS CLI? I'll reset, try again, and then try to load a df from an existing file.

In [16]:
fs = s3fs.S3FileSystem(anon=False,key='AKIASQCX6MEBOZF7WCYA',secret='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')
fs.ls('worm-begin')

['worm-begin/album_details_25k.csv',
 'worm-begin/lyrics_25k.csv',
 'worm-begin/songs_details_25k.csv']

In [17]:
test2 = pd.read_csv('s3://worm-begin/decades_tcc_ceds_music.csv')

In [18]:
test2.describe(include='all')

Unnamed: 0.1,Unnamed: 0,artist_name,track_name,release_date,genre,lyrics,len,dating,violence,world/life,...,sadness,feelings,danceability,loudness,acousticness,instrumentalness,valence,energy,topic,age
count,28372.0,28372,28372,28372.0,28372,28372,28372.0,28372.0,28372.0,28372.0,...,28372.0,28372.0,28372.0,28372.0,28372.0,28372.0,28372.0,28372.0,28372,28372.0
unique,,5426,23689,,7,28372,,,,,...,,,,,,,,,8,
top,,johnny cash,tonight,,pop,stand today like dream start live night eye he...,,,,,...,,,,,,,,,sadness,
freq,,190,17,,7042,1,,,,,...,,,,,,,,,6096,
mean,42946.323558,,,1990.236888,,,73.028444,0.021112,0.118396,0.120973,...,0.129389,0.030996,0.533348,0.665249,0.3392347,0.080049,0.532864,0.569875,,0.425187
std,24749.325492,,,18.487463,,,41.829831,0.05237,0.178684,0.1722,...,0.181143,0.071652,0.173218,0.108434,0.3267143,0.211245,0.250972,0.244385,,0.264107
min,0.0,,,1950.0,,,1.0,0.000291,0.000284,0.000291,...,0.000284,0.000289,0.005415,0.0,2.811248e-07,0.0,0.0,0.0,,0.014286
25%,20391.25,,,1975.0,,,42.0,0.000923,0.00112,0.00117,...,0.001144,0.000993,0.412975,0.595364,0.03423598,0.0,0.329143,0.380361,,0.185714
50%,45405.5,,,1991.0,,,63.0,0.001462,0.002506,0.006579,...,0.005263,0.001754,0.538612,0.67905,0.2259028,8.5e-05,0.539365,0.580567,,0.414286
75%,64090.5,,,2007.0,,,93.0,0.004049,0.192608,0.197793,...,0.235113,0.032622,0.656666,0.749026,0.6325298,0.009335,0.738252,0.772766,,0.642857


In [19]:
fs.ls('worm-begin')

['worm-begin/album_details_25k.csv',
 'worm-begin/lyrics_25k.csv',
 'worm-begin/songs_details_25k.csv']

In [20]:
!aws s3 ls 'worm-begin'

2021-04-07 08:02:29     105621 album_details_25k.csv
2021-04-07 08:09:22   27655251 decades_tcc_ceds_music.csv
2021-04-07 08:19:38  276163416 genres_lyrics_data.csv
2021-04-07 08:22:31  187407184 labeled_lyrics_cleaned.csv
2021-04-07 08:02:16   40754845 lyrics_25k.csv
2021-04-07 08:02:30    2276377 songs_details_25k.csv


In [21]:
s3_url = upload_file_to_bucket('worm-begin','genres_artists_data.csv')
!aws s3 ls 'worm-begin'

2021-04-07 08:02:29     105621 album_details_25k.csv
2021-04-07 08:09:22   27655251 decades_tcc_ceds_music.csv
2021-04-07 08:34:22     348120 genres_artists_data.csv
2021-04-07 08:19:38  276163416 genres_lyrics_data.csv
2021-04-07 08:22:31  187407184 labeled_lyrics_cleaned.csv
2021-04-07 08:02:16   40754845 lyrics_25k.csv
2021-04-07 08:02:30    2276377 songs_details_25k.csv


Upload to WORM bucket of everything to date is complete.

Check the boto download method...

In [26]:
import pandas as pd
download_file_from_bucket('worm-begin', 'album_details_25k.csv', 'short_name.csv')
with open('short_name.csv') as fo:
    test3 = pd.read_csv(fo)

In [27]:
test3.head()

Unnamed: 0.1,Unnamed: 0,id,singer_name,name,type,year
0,0,5765.0,Taylor Swift Lyrics,Taylor Swift,album,2006
1,1,6432.0,Taylor Swift Lyrics,Sounds Of The Season: The Taylor Swift Holiday...,EP,2007
2,2,6995.0,Taylor Swift Lyrics,Fearless,album,2008
3,3,10358.0,Taylor Swift Lyrics,Speak Now,album,2010
4,4,24353.0,Taylor Swift Lyrics,Red,album,2012
