# Data Storytelling with Tableau and Python 
## 1. Create virtual environment in VS Code  
1.1 in VS Code: File>Open Folder>Open your project folder OR cd C:\Users\laura\OneDrive\Desktop\projects\tableau_python (change path to your project folder path)
<br>
1.2 ctrl + shift + p, create virtual environment
<br>
1.3 Select kernel: venv
You may see a popup saying that even though your virtual environment is activated, it may not look like it is not but it is indeed active. 
To verify this, check the explorer tab on the left and it should tell you that your folder now contains a dependency called .venv.  

>[ manual way to a virtual environment: 
in VS Code: Terminal>New terminal 
in your terminal while you're in your present folder (check with pwd): 
<br>
run python -m venv venv, hit enter. this will create a new folder holding your project depencies 
<br>
in your terminal: venv\Scripts\Activate, hit enter again]

# 2. Install Python, Tableau and any packages you may need
2.1 install and update pip: python -m pip install --upgrade pip
<br>
2.2 install Python:
https://www.python.org/downloads/ on your browser to download the latest version and follow the on screen instructions
<br>
2.3 install Tableau Public https://public.tableau.com/app/discover

In [165]:
import pandas as pd # basic data analysis library 
import csv # helps Python read a CSV (raw data) file
import os # operating system functionality for Python to be used with Kaggle
import opendatasets as od # helps download a dataset from Kaggle

# 3. Find a Dataset you would like to Investigate
For free datasets: Kaggle. 
Other options: 
- https://data.fivethirtyeight.com/
- https://github.com/BuzzFeedNews/everything
- https://archive.ics.uci.edu/ml/datasets.php
- https://github.com/awesomedata/awesome-public-datasets
- https://www.kaggle.com/datasets?license=cc
- https://datasetsearch.research.google.com/

In [166]:
# in kaggle, navigate to your chosen dataset and copy and paste the link to it
dataset_url  = 'https://www.kaggle.com/datasets/iridazzle/webtoon-originals-datasets?select=webtoon_originals_en.csv'

In [167]:
od.download(dataset_url)

Skipping, found downloaded files in ".\webtoon-originals-datasets" (use force=True to force download)


In [168]:
# define the directory by copy pasting the part after the author's name and before the ?resource=download part
directory = './webtoon-originals-datasets'


In [169]:
os.listdir(directory)

['get_webtoon_csv.ipynb',
 'webtoon_dataframes.ipynb',
 'webtoon_originals_de.csv',
 'webtoon_originals_en.csv',
 'webtoon_originals_es.csv',
 'webtoon_originals_fr.csv',
 'webtoon_originals_id.csv',
 'webtoon_originals_th.csv',
 'webtoon_originals_zh-hant.csv']

In [170]:
# transform your file into a Pandas dataframe, ready for further analysis
df = pd.read_csv('.\webtoon-originals-datasets\webtoon_originals_en.csv')

In [171]:
df.head()

Unnamed: 0,title_id,title,genre,authors,weekdays,length,subscribers,rating,views,likes,status,daily_pass,synopsis
0,4603,Red Hood: Outlaws,ACTION,"Patrick R. Young,Nico Bascuñan",MONDAY,5,219484,9.64,997928,108280,ONGOING,False,The Outlaws try to go legit -- and fail specta...
1,4596,A Dance of Swords in the Night,ACTION,"JP,MinskySong,Yu jin sung",WEDNESDAY,4,61812,9.6,207372,24226,ONGOING,False,Kidnapped as a child to be trained in the art ...
2,4583,Kill the Dragon,ACTION,"Miss Jisu,J-Sun(REDICE STUDIO),Baeksu Noble",THURSDAY,7,176840,8.84,1687308,121186,ONGOING,False,The human race plunged into a war against drag...
3,4582,Mythic Item Obtained,FANTASY,"Jung SeonYul,Hess",THURSDAY,5,167086,9.43,842795,67739,ONGOING,False,In a world where technology and magic are almo...
4,4572,Marry My Husband,ROMANCE,"sungsojak,LICO",MONDAY,3,24449,9.49,71290,8516,ONGOING,False,"When Jiwon, a 37-year-old cancer patient, walk..."


In [172]:
df.dtypes

title_id         int64
title           object
genre           object
authors         object
weekdays        object
length           int64
subscribers      int64
rating         float64
views            int64
likes            int64
status          object
daily_pass        bool
synopsis        object
dtype: object

In [173]:
# remove entire columns that are not needed
# axis=1 drops columns, axis=0 would drop rows
df.drop(['title_id', 'weekdays', 'daily_pass'], axis=1)

Unnamed: 0,title,genre,authors,length,subscribers,rating,views,likes,status,synopsis
0,Red Hood: Outlaws,ACTION,"Patrick R. Young,Nico Bascuñan",5,219484,9.64,997928,108280,ONGOING,The Outlaws try to go legit -- and fail specta...
1,A Dance of Swords in the Night,ACTION,"JP,MinskySong,Yu jin sung",4,61812,9.60,207372,24226,ONGOING,Kidnapped as a child to be trained in the art ...
2,Kill the Dragon,ACTION,"Miss Jisu,J-Sun(REDICE STUDIO),Baeksu Noble",7,176840,8.84,1687308,121186,ONGOING,The human race plunged into a war against drag...
3,Mythic Item Obtained,FANTASY,"Jung SeonYul,Hess",5,167086,9.43,842795,67739,ONGOING,In a world where technology and magic are almo...
4,Marry My Husband,ROMANCE,"sungsojak,LICO",3,24449,9.49,71290,8516,ONGOING,"When Jiwon, a 37-year-old cancer patient, walk..."
...,...,...,...,...,...,...,...,...,...,...
806,Tales of the Unusual,HORROR,Sungdae Oh,377,799442,9.66,154504958,9518991,ONGOING,In dangerous urban legends and dark ancient my...
807,Knight Run,SF,Sungmin Kim,170,92577,9.31,4667488,165808,COMPLETED,If you could be teleported down to the planet’...
808,The God of High School,ACTION,Yongje Park,557,2683666,9.69,743008133,28814309,ONGOING,Mori Jin is a high school student and Taekwond...
809,HIVE,THRILLER,Kyusam Kim,17,462207,9.66,66639952,2861836,COMPLETED,Gigantic oxygen-doped bees are attempting to d...


In [174]:
df.describe()

Unnamed: 0,title_id,length,subscribers,rating,views,likes
count,811.0,811.0,811.0,811.0,811.0,811.0
mean,2239.130703,79.254007,451915.6,9.373009,40335320.0,3070339.0
std,1275.850758,122.830012,672326.3,0.5952,108153200.0,6572596.0
min,64.0,1.0,6014.0,4.09,71290.0,8516.0
25%,1261.5,15.0,109249.0,9.25,2461856.0,255740.5
50%,2384.0,46.0,221462.0,9.54,8160504.0,772222.0
75%,3246.5,87.0,502221.5,9.72,27889200.0,2810740.0
max,4603.0,1410.0,7128661.0,9.94,1121118000.0,59844920.0


In [175]:
df.to_csv('C:\Users\laura\OneDrive\Desktop\projects\tableau_python/df2.csv', index=False)


SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape (1439977685.py, line 1)