***

# **Analiza podatkov podcasta DTFH**

***


## Uvod

Duncan Trussell Family Hour (DTFH skrajšano) je podcast ameriškega avtorja Duncana Trussella, ki se ne preveč resno ukvarja s tematikami moderne duhovnosti. Gostje posameznih epizod so lahko vse od novinarjev, komediantov in glasbenikov do budističnih gurujev, psihoterapevtov ter izvajalcev poganskih spiritualnih obredov. Posamezna epizoda običajno poteka kot prost pogovor med Duncanom in njegovim gostom (ali gosti, v primeru ko se mu v eter pridruži več kot ena oseba). Pogoste tematike obsegajo življenske anekdote, mnenja o spiritualnosti, raznorazne pogovore o meditaciji ter uporabo halucinogenih psihoaktivnih substanc v namen spoznavanja sebe in sveta okoli nas.

V tem dokumentu bom analiziral nekatere podatke, ki sem jih zajel iz avtorjeve spletne strani ([DTFH](http://www.duncantrussell.com/episodes)). Začnimo najprej s potrebnim uvozom python paketa pandas, s katerim bom analizo izvajal, ter dokumenta DTFH.csv, v katerem so zbrani podatki posameznih epizod, potem pa si oglejmo kako izgledajo naši surovi podatki:

In [28]:
# Importanje pandas in dsv datoteke
import pandas as pd
podcasti = pd.read_csv('C:/faks/programiranje 1/DTFH-analiza-podatkov/DTFH.csv')
pd.options.display.max_rows = 20

In [95]:
# Ogled surovih podatkov
podcasti

Unnamed: 0,naslov,povezava,dolzina,datum,eksplicitnost,opis,dolzina_h,mesec
0,Erin Trussell,https://audioboom.com/posts/8026230,4959,"Sat, 05 Feb 2022 02:20:29",no,"Erin Trussell, cult-mommy and Duncan's intima...","(1, 22, 39)",Feb 2022
1,David Nichtern,https://audioboom.com/posts/8021385,5866,"Sat, 29 Jan 2022 03:56:56",no,"David Nichtern, Senior Buddhist teacher at Dh...","(1, 37, 46)",Jan 2022
2,David Chernikoff,https://audioboom.com/posts/8017010,6609,"Fri, 21 Jan 2022 17:06:58",no,"David Chernikoff, spiritual teacher and autho...","(1, 50, 9)",Jan 2022
3,Andrew Yang,https://audioboom.com/posts/8013368,4807,"Sat, 15 Jan 2022 03:35:02",no,"Andrew Yang, brilliant human and presidential...","(1, 20, 7)",Jan 2022
4,Nikki Walton,https://audioboom.com/posts/8009414,4909,"Sat, 08 Jan 2022 02:05:22",no,"Nikki Walton, best-selling author and NAACP a...","(1, 21, 49)",Jan 2022
...,...,...,...,...,...,...,...,...
395,Derrick Beckles is a HOT PACKAGE,https://audioboom.com/posts/6794027,6737,"Tue, 03 Dec 2013 03:00:46",no,"Derrick Beckles (TV carnage, Eric Andre Show, ...","(1, 52, 17)",Dec 2013
396,Erin McGathy confronts the darklord,https://audioboom.com/posts/6794028,7947,"Tue, 26 Nov 2013 21:52:56",no,"Erin McGathy (host of the amazing ""This Feels ...","(2, 12, 27)",Nov 2013
397,DUSTIN MARSHALL,https://audioboom.com/posts/6794029,7523,"Wed, 20 Nov 2013 22:30:49",no,The brilliant lord of podcasts and father of F...,"(2, 5, 23)",Nov 2013
398,DAN HARMON!!!!!,https://audioboom.com/posts/6794030,6296,"Mon, 11 Nov 2013 22:34:41",no,"Dan Harmon (Community, Harmontown) returns to ...","(1, 44, 56)",Nov 2013


## Trendi objavitev in trajanja epizod

### Povprečno trajanje epizode

Začnimo z izračunom povprečne dolžine epizode podcasta. Ker je v csv datoteki dolžina napisana v sekundah (saj je to format, v katerem je zapisana na itunesih) definirajmo še pomožno funkcijo, ki nam bo to dolžino zapisala v bolj predstavljivi obliki (ure, minute, sekunde).

In [78]:
# Pomožna funkcija ter dodajanje novega stolpca v tabelo.
def sekunde_v_ure(n):
    h = (n // 3600)
    min = ((n // 60) - h * 60)
    sec = (n - 60 * min - 3600 * h)
    return (h, min, sec)

podcasti['dolzina_h'] = (podcasti.dolzina).apply(sekunde_v_ure)
podcasti

Unnamed: 0,naslov,povezava,dolzina,datum,eksplicitnost,opis,dolzina_h,mesec
0,Erin Trussell,https://audioboom.com/posts/8026230,4959,"Sat, 05 Feb 2022 02:20:29",no,"Erin Trussell, cult-mommy and Duncan's intima...","(1, 22, 39)",Feb 2022
1,David Nichtern,https://audioboom.com/posts/8021385,5866,"Sat, 29 Jan 2022 03:56:56",no,"David Nichtern, Senior Buddhist teacher at Dh...","(1, 37, 46)",Jan 2022
2,David Chernikoff,https://audioboom.com/posts/8017010,6609,"Fri, 21 Jan 2022 17:06:58",no,"David Chernikoff, spiritual teacher and autho...","(1, 50, 9)",Jan 2022
3,Andrew Yang,https://audioboom.com/posts/8013368,4807,"Sat, 15 Jan 2022 03:35:02",no,"Andrew Yang, brilliant human and presidential...","(1, 20, 7)",Jan 2022
4,Nikki Walton,https://audioboom.com/posts/8009414,4909,"Sat, 08 Jan 2022 02:05:22",no,"Nikki Walton, best-selling author and NAACP a...","(1, 21, 49)",Jan 2022
...,...,...,...,...,...,...,...,...
395,Derrick Beckles is a HOT PACKAGE,https://audioboom.com/posts/6794027,6737,"Tue, 03 Dec 2013 03:00:46",no,"Derrick Beckles (TV carnage, Eric Andre Show, ...","(1, 52, 17)",Dec 2013
396,Erin McGathy confronts the darklord,https://audioboom.com/posts/6794028,7947,"Tue, 26 Nov 2013 21:52:56",no,"Erin McGathy (host of the amazing ""This Feels ...","(2, 12, 27)",Nov 2013
397,DUSTIN MARSHALL,https://audioboom.com/posts/6794029,7523,"Wed, 20 Nov 2013 22:30:49",no,The brilliant lord of podcasts and father of F...,"(2, 5, 23)",Nov 2013
398,DAN HARMON!!!!!,https://audioboom.com/posts/6794030,6296,"Mon, 11 Nov 2013 22:34:41",no,"Dan Harmon (Community, Harmontown) returns to ...","(1, 44, 56)",Nov 2013


In [36]:
# Izračun povprečja dolžine
pov_dolzina = round((podcasti.dolzina).mean())
print(sekunde_v_ure(pov_dolzina))

(1, 34, 10)


Povprečna dolžina epizode je torej 1h 34min.

### Razporeditev epizod glede na dolžino

Poglejmo si še koliko epizod je krajših od ene ure, koliko jih je med eno in dvema urama ter koliko jih je daljših od dveh ur.

In [47]:
#Vse epizode razvrščene po dolžini
rezina_dolzin = (podcasti[['naslov', 'dolzina']]).copy()
rezina_dolzin['dolzina_h'] = (podcasti.dolzina).apply(sekunde_v_ure)
rezina_dolzin.sort_values('dolzina')

Unnamed: 0,naslov,dolzina,dolzina_h
60,Introducing Dark Air With Terry Carnation,852,"(0, 14, 12)"
96,Raghu Markus,2492,"(0, 41, 32)"
264,David Nichtern,2570,"(0, 42, 50)"
20,"The Leather Rose, Episode 1",2901,"(0, 48, 21)"
300,LAMA SURYA DAS,3324,"(0, 55, 24)"
...,...,...,...
236,Aubrey Marcus LIVE from the Brooklyn Bell House,9614,"(2, 40, 14)"
98,Shane Mauss,9626,"(2, 40, 26)"
393,BRODY STEVENS,9646,"(2, 40, 46)"
76,Shane Mauss,10600,"(2, 56, 40)"


In [49]:
#Epizode krajše od 1 ure
rezina_dolzin_kratke = rezina_dolzin[rezina_dolzin.dolzina < 3600]
rezina_dolzin_kratke.sort_values('dolzina')

Unnamed: 0,naslov,dolzina,dolzina_h
60,Introducing Dark Air With Terry Carnation,852,"(0, 14, 12)"
96,Raghu Markus,2492,"(0, 41, 32)"
264,David Nichtern,2570,"(0, 42, 50)"
20,"The Leather Rose, Episode 1",2901,"(0, 48, 21)"
300,LAMA SURYA DAS,3324,"(0, 55, 24)"
334,JOAN HALIFAX and RAGHU MARKUS,3524,"(0, 58, 44)"
221,Wayne Coyne,3541,"(0, 59, 1)"


In [63]:
#Epizode med 1 in 2 urama
rezina_dolzin_pov = (rezina_dolzin[(rezina_dolzin.dolzina >= 3600)])[(rezina_dolzin.dolzina < 7200)]
rezina_dolzin_pov.sort_values('dolzina')

  rezina_dolzin_pov = (rezina_dolzin[(rezina_dolzin.dolzina >= 3600)])[(rezina_dolzin.dolzina < 7200)]


Unnamed: 0,naslov,dolzina,dolzina_h
359,JACK KORNFIELD,3614,"(1, 0, 14)"
351,Krishna Das,3645,"(1, 0, 45)"
284,George Noory is THE NIGHTHAWK,3678,"(1, 1, 18)"
322,Dr. Drew And Fred Stoller,3704,"(1, 1, 44)"
160,David Nichtern,3734,"(1, 2, 14)"
...,...,...,...
140,Emil Amos,7048,"(1, 57, 28)"
392,LANCE BANGS!,7064,"(1, 57, 44)"
354,BERT KREISCHER,7067,"(1, 57, 47)"
47,Jason Louv,7159,"(1, 59, 19)"


In [65]:
#Epizode daljše od 2 ur
rezina_dolzin_dolge = rezina_dolzin[rezina_dolzin.dolzina >= 7200]
rezina_dolzin_dolge.sort_values('dolzina')

Unnamed: 0,naslov,dolzina,dolzina_h
204,Karen Kilgariff,7200,"(2, 0, 0)"
312,Kevin Johnson,7246,"(2, 0, 46)"
137,Rob Schrab,7246,"(2, 0, 46)"
211,Natasha Leggero and Riki Lindhome,7260,"(2, 1, 0)"
127,Susan Marrufo,7282,"(2, 1, 22)"
...,...,...,...
236,Aubrey Marcus LIVE from the Brooklyn Bell House,9614,"(2, 40, 14)"
98,Shane Mauss,9626,"(2, 40, 26)"
393,BRODY STEVENS,9646,"(2, 40, 46)"
76,Shane Mauss,10600,"(2, 56, 40)"


In [66]:
#Preštete zgornje skupine epizod
st_kratkih = len(rezina_dolzin_kratke.index)
st_pov = len(rezina_dolzin_pov.index)
st_dolgih = len(rezina_dolzin_dolge.index)
print(st_kratkih, st_pov, st_dolgih)

7 336 57


Krajših od ene ure je torej le 7 epizod, daljših od dveh ur pa 57, s tem, da je najdaljša dolga več kot tri ure. Velika večina epizod (336) pa je dolgim nekaj med eno in dvema urama.

### Meseci objavitev epizod

Poglejmo si kdaj so bile epizode objavljene. V prvotni tabeli so časi objave napisani do dneva in celo ure natačno. Relevanten podatek bo kvečjemu katerega meseca in katerega leta je izšla epizoda, zatorej oblikujmo najprej funkcijo, ki nam bo podala to obliko.

In [99]:
#Definicija pomožne funkcije ter dodajanje 
def pretvori_v_mesece(str):
    return str[7:-8]

podcasti['mesec'] = (podcasti.datum).apply(pretvori_v_mesece)
podcasti

Unnamed: 0,naslov,povezava,dolzina,datum,eksplicitnost,opis,dolzina_h,mesec
0,Erin Trussell,https://audioboom.com/posts/8026230,4959,"Sat, 05 Feb 2022 02:20:29",no,"Erin Trussell, cult-mommy and Duncan's intima...","(1, 22, 39)",Feb 2022
1,David Nichtern,https://audioboom.com/posts/8021385,5866,"Sat, 29 Jan 2022 03:56:56",no,"David Nichtern, Senior Buddhist teacher at Dh...","(1, 37, 46)",Jan 2022
2,David Chernikoff,https://audioboom.com/posts/8017010,6609,"Fri, 21 Jan 2022 17:06:58",no,"David Chernikoff, spiritual teacher and autho...","(1, 50, 9)",Jan 2022
3,Andrew Yang,https://audioboom.com/posts/8013368,4807,"Sat, 15 Jan 2022 03:35:02",no,"Andrew Yang, brilliant human and presidential...","(1, 20, 7)",Jan 2022
4,Nikki Walton,https://audioboom.com/posts/8009414,4909,"Sat, 08 Jan 2022 02:05:22",no,"Nikki Walton, best-selling author and NAACP a...","(1, 21, 49)",Jan 2022
...,...,...,...,...,...,...,...,...
395,Derrick Beckles is a HOT PACKAGE,https://audioboom.com/posts/6794027,6737,"Tue, 03 Dec 2013 03:00:46",no,"Derrick Beckles (TV carnage, Eric Andre Show, ...","(1, 52, 17)",Dec 2013
396,Erin McGathy confronts the darklord,https://audioboom.com/posts/6794028,7947,"Tue, 26 Nov 2013 21:52:56",no,"Erin McGathy (host of the amazing ""This Feels ...","(2, 12, 27)",Nov 2013
397,DUSTIN MARSHALL,https://audioboom.com/posts/6794029,7523,"Wed, 20 Nov 2013 22:30:49",no,The brilliant lord of podcasts and father of F...,"(2, 5, 23)",Nov 2013
398,DAN HARMON!!!!!,https://audioboom.com/posts/6794030,6296,"Mon, 11 Nov 2013 22:34:41",no,"Dan Harmon (Community, Harmontown) returns to ...","(1, 44, 56)",Nov 2013


In [100]:
def pretvori_v_leta(str):
    return str[11:-8]

podcasti['leto'] = (podcasti.datum).apply(pretvori_v_leta)
podcasti


Unnamed: 0,naslov,povezava,dolzina,datum,eksplicitnost,opis,dolzina_h,mesec,leto
0,Erin Trussell,https://audioboom.com/posts/8026230,4959,"Sat, 05 Feb 2022 02:20:29",no,"Erin Trussell, cult-mommy and Duncan's intima...","(1, 22, 39)",Feb 2022,2022
1,David Nichtern,https://audioboom.com/posts/8021385,5866,"Sat, 29 Jan 2022 03:56:56",no,"David Nichtern, Senior Buddhist teacher at Dh...","(1, 37, 46)",Jan 2022,2022
2,David Chernikoff,https://audioboom.com/posts/8017010,6609,"Fri, 21 Jan 2022 17:06:58",no,"David Chernikoff, spiritual teacher and autho...","(1, 50, 9)",Jan 2022,2022
3,Andrew Yang,https://audioboom.com/posts/8013368,4807,"Sat, 15 Jan 2022 03:35:02",no,"Andrew Yang, brilliant human and presidential...","(1, 20, 7)",Jan 2022,2022
4,Nikki Walton,https://audioboom.com/posts/8009414,4909,"Sat, 08 Jan 2022 02:05:22",no,"Nikki Walton, best-selling author and NAACP a...","(1, 21, 49)",Jan 2022,2022
...,...,...,...,...,...,...,...,...,...
395,Derrick Beckles is a HOT PACKAGE,https://audioboom.com/posts/6794027,6737,"Tue, 03 Dec 2013 03:00:46",no,"Derrick Beckles (TV carnage, Eric Andre Show, ...","(1, 52, 17)",Dec 2013,2013
396,Erin McGathy confronts the darklord,https://audioboom.com/posts/6794028,7947,"Tue, 26 Nov 2013 21:52:56",no,"Erin McGathy (host of the amazing ""This Feels ...","(2, 12, 27)",Nov 2013,2013
397,DUSTIN MARSHALL,https://audioboom.com/posts/6794029,7523,"Wed, 20 Nov 2013 22:30:49",no,The brilliant lord of podcasts and father of F...,"(2, 5, 23)",Nov 2013,2013
398,DAN HARMON!!!!!,https://audioboom.com/posts/6794030,6296,"Mon, 11 Nov 2013 22:34:41",no,"Dan Harmon (Community, Harmontown) returns to ...","(1, 44, 56)",Nov 2013,2013


Oglejmo si sedaj graf, koliko epizod je bilo objavljenih na posamezen mesec.

In [108]:
#Graf epizod po mesecih
podcasti_epizode = podcasti.groupby('leto').size()
podcasti_epizode

leto
 2013      7
 2014     43
 2015     42
 2016     45
 2017     47
 2018     47
 2019     46
 2020     49
 2021     68
 2022      6
dtype: int64