# Music Charts

In this example, we will extract information about music charts from tables on Wikipedia.
We will populate the Wikidata predicate "charted in" ([P2291](https://www.wikidata.org/wiki/Property:P2291)), which often occurs with the qualifiers "point in time" ([P585](https://www.wikidata.org/wiki/Property:P585)) and "ranking" ([P1352](https://www.wikidata.org/wiki/Property:P1352)).

In [1]:
from rdflib.plugins.stores.sparqlstore import SPARQLStore
st = SPARQLStore('http://query.wikidata.org/sparql')
r = st.query("""
SELECT DISTINCT ?entity ?article WHERE {
    # ?entity p:P726 ?p . ?p ps:P726 ?o . ?p pq:P1111 ?votes .
    ?entity wdt:P2291 ?chart .
    ?article schema:about ?entity .
    ?article schema:isPartOf <https://en.wikipedia.org/>.
}
""")
ent_abouturl = [tuple(b[v] for v in r.vars) for b in r.bindings]
ent_abouturl = sorted([
    (e, url.replace('https://en.wikipedia.org/wiki/', 'http://bricks07:8989/wikipedia_en_all_nopic_2020-10/A/'))
    for e, url in ent_abouturl
])
len(ent_abouturl)



436

In [2]:
%%time
import takco, tqdm
sample = tqdm.tqdm(ent_abouturl)
pages = takco.extract.Download(sample, encoding='utf8').load()
extracted = list(takco.extract.extract_tables(pages, link_pattern=r'[^\W](?!ttp:)'))
print(f"Got {len(list(extracted))} tables")
takco.preview(extracted, nrows=5, ntables=25)

100%|██████████| 436/436 [01:32<00:00,  4.70it/s]


Got 1424 tables
CPU times: user 1min 25s, sys: 1.05 s, total: 1min 26s
Wall time: 1min 34s


?,0,1
Unnamed: 0_level_1,Source,Rating
,Allmusic,
,Christgau's Record Guide,C
,Rolling Stone,(Favorable)

?,0,1
Unnamed: 0_level_1,Chart (1972),Peak position
,Australian Kent Music Report Albums Chart,1
,Canadian RPM Albums Chart,1
,Dutch Mega Albums Chart,2
,French SNEP Albums Chart,4
,Italian Albums Chart,8

?,0,1
Unnamed: 0_level_1,Chart (1972),Position
,Australian Albums Chart,12
,French Albums Chart,21
,Italian Albums Chart,44

?,0,1
Unnamed: 0_level_1,Chart (1973),Position
,U.S. Billboard Year-End,14

?,0,1
Unnamed: 0_level_1,Region,Certification
,United States ( RIAA ),Platinum

?,0,1
Unnamed: 0_level_1,Source,Rating
,AllMusic,
,Encyclopedia of Popular Music,
,NME,10/10
,The Philadelphia Inquirer,
,Pitchfork,10/10

?,0,1
Unnamed: 0_level_1,Charts (1989),Peak position
,UK Albums Chart,13
,U.S. Billboard 200,24
,U.S. Top R&B/Hip-Hop Albums,1

?,0,1,2
Unnamed: 0_level_1,Region,Certification,Certified units /sales
,United States ( RIAA ),Platinum,"1,000,000 ^"
,* sales figures based on certification alone ^ shi (...),* sales figures based on certification alone ^ shi (...),* sales figures based on certification alone ^ shi (...)

?,0,1
Unnamed: 0_level_1,Chart (2000),Peak position
,Australia ( ARIA ),72
,US Modern Rock Tracks ( Billboard ),11
,US Mainstream Rock Tracks ( Billboard ),27

?,0,1
Unnamed: 0_level_1,Source,Rating
,Allmusic,

?,0,1,2,3,4,5
Unnamed: 0_level_1,Region,Date,Label,Format,Catalog,Note
,Japan,"July 17, 2002",Pioneer LDC,12cmCD,PICL-1257,
,Japan,"July 26, 2006",Geneon Entertainment,12cmCD,GNCL-1083,

?,0,1
Unnamed: 0_level_1,Charts (2010),Peak position
,Japan Billboard Top Albums Sales,4
,Japan Oricon daily albums,2
,Japan Oricon weekly albums,3

?,0,1
Unnamed: 0_level_1,Chart,Amount
,Oricon physical sales,59000

?,0,1,2,3,4
Unnamed: 0_level_1,Region,Date,Format,Distributing Label,Catalogue codes
,Japan,"March 17, 2010 ( 2010-03-17 )","CD , limited edition CD, digital download",Victor Entertainment,"VICL-63556, VICL-63557"
,South Korea,"March 23, 2010 ( 2010-03-23 )",digital download,J-Box Entertainment,
,Taiwan,"April 9, 2010 ( 2010-04-09 )",CD,Rock Records,GUT2307
,Japan,"March 18, 2015 ( 2015-03-18 )",lossless digital download,Victor,VEAHD-10617
,Japan,"August 5, 2015 ( 2015-08-05 )",LP record,Victor,VIJL-60151~2

?,0,1
Unnamed: 0_level_1,Source,Rating
,Allmusic,not rated

?,0,1,2,3,4,5
Unnamed: 0_level_1,Region,Date,Label,Format,Catalog,Note
,Japan,"July 11, 2001",Pioneer LDC,12cmCD,PICL-1219,
,Japan,"July 26, 2006",Geneon Entertainment,12cmCD,GNCL-1082,

?,0,1
Unnamed: 0_level_1,Charts,Peak position
,Japan Billboard Top Albums Sales,1
,Japan Oricon daily albums,1
,Japan Oricon weekly albums,1
,Japan Oricon monthly albums,5
,Japan Oricon yearly albums,31

?,0,1
Unnamed: 0_level_1,Chart,Amount
,Oricon physical sales,176000
,RIAJ physical certification,"Gold (100,000+)"

?,0,1,2,3,4
Unnamed: 0_level_1,Region,Date,Format,Distributing Label,Catalogue codes
,Japan,"March 13, 2013 ( 2013-03-13 )","CD , CD/ DVD , CD/ Blu-ray , digital download",Victor Entertainment,"VICL-63999, VIZL-519, VIZL-520"
,Taiwan,"April 12, 2013 ( 2013-04-12 )",CD,Rock Records,GUT2418.4
,South Korea,"May 29, 2013 ( 2013-05-29 )",digital download,J-Box Entertainment,
,Japan,"March 18, 2015 ( 2015-03-18 )",lossless digital download,Victor,VEAHD-10615
,Japan,"August 5, 2015 ( 2015-08-05 )",LP record,Victor,VIJL-60153~4

?,0,1,2
Unnamed: 0_level_1,Release date,Version,Oricon Weekly Albums Chart
,25 April 1984,Original release,1
,19 November 2014,30th Anniversary Edition,13

?,0,1
Unnamed: 0_level_1,Source,Rating
,AllMusic,

?,0,1
Unnamed: 0_level_1,Charts (2008),Peak position
,Japan Billboard Adult Contemporary Airplay,45
,Japan Billboard Japan Hot 100,9
,Japan Oricon weekly singles,35

?,0,1
Unnamed: 0_level_1,Chart,Amount
,Oricon physical sales,6000

?,0,1,2,3,4
Unnamed: 0_level_1,Region,Date,Format,Distributing Label,Catalog codes
,Japan,"November 12, 2008 ( 2008-11-12 )","ringtone, digital download",BabeStar Label,VEAML-22686
,Japan,"December 10, 2008 ( 2008-12-10 )","CD single , limited edition CD single",BabeStar Label,"VICB-35013, VICB-35014"
,Japan,"December 17, 2008 ( 2008-12-17 )",rental CD single,BabeStar Label,"VICB-35013, VICB-35014"
,South Korea,"April 13, 2009 ( 2009-04-13 )",digital download (EP),J-Box Entertainment,

?,0,1
Unnamed: 0_level_1,Source,Rating
,Allmusic,


In [4]:
with open('/export/scratch1/home/kruit/scratch/chartedIn.jsonl', 'w') as fw:
    for t in extracted:
        print(takco.util.json_dump(t), file=fw)

In [3]:
steps = takco.config.build('step', load=['resources/graphs/wikidata.toml','resources/pipelines/TabEL.toml'])
unpivot_heuristics = steps[0]['unpivot_heuristics']

reshaped = list(takco.TableSet.reshape(extracted, unpivot_heuristics=unpivot_heuristics))
print(f"Processed {len(list(reshaped))} tables")
takco.preview(reshaped, nrows=5, ntables=25)

Processed 259 tables


?,0,1
Unnamed: 0_level_1,Source,Rating
,Allmusic,
,Christgau's Record Guide,C
,Rolling Stone,(Favorable)

?,0,1,2
Unnamed: 0_level_1,Peak position,_Variable,Chart
,1,1972,Australian Kent Music Report Albums Chart
,1,1972,Canadian RPM Albums Chart
,2,1972,Dutch Mega Albums Chart
,4,1972,French SNEP Albums Chart
,8,1972,Italian Albums Chart

?,0,1,2
Unnamed: 0_level_1,Position,_Variable,Chart
,12,1972,Australian Albums Chart
,21,1972,French Albums Chart
,44,1972,Italian Albums Chart

?,0,1
Unnamed: 0_level_1,Source,Rating
,AllMusic,
,Encyclopedia of Popular Music,
,NME,10/10
,The Philadelphia Inquirer,
,Pitchfork,10/10

?,0,1,2
Unnamed: 0_level_1,Peak position,_Variable,Charts
,13,1989,UK Albums Chart
,24,1989,U.S. Billboard 200
,1,1989,U.S. Top R&B/Hip-Hop Albums

?,0,1,2
Unnamed: 0_level_1,Region,Certification,Certified units /sales
,United States ( RIAA ),Platinum,"1,000,000 ^"

?,0,1,2
Unnamed: 0_level_1,Peak position,_Variable,Chart
,72,2000,Australia ( ARIA )
,11,2000,US Modern Rock Tracks ( Billboard )
,27,2000,US Mainstream Rock Tracks ( Billboard )

?,0,1,2,3,4
Unnamed: 0_level_1,Region,Date,Label,Format,Catalog
,Japan,"July 17, 2002",Pioneer LDC,12cmCD,PICL-1257
,Japan,"July 26, 2006",Geneon Entertainment,12cmCD,GNCL-1083

?,0,1,2
Unnamed: 0_level_1,Peak position,_Variable,Charts
,4,2010,Japan Billboard Top Albums Sales
,2,2010,Japan Oricon daily albums
,3,2010,Japan Oricon weekly albums

?,0,1,2,3,4
Unnamed: 0_level_1,Region,Date,Format,Distributing Label,Catalogue codes
,Japan,"March 17, 2010 ( 2010-03-17 )","CD , limited edition CD, digital download",Victor Entertainment,"VICL-63556, VICL-63557"
,South Korea,"March 23, 2010 ( 2010-03-23 )",digital download,J-Box Entertainment,
,Taiwan,"April 9, 2010 ( 2010-04-09 )",CD,Rock Records,GUT2307
,Japan,"March 18, 2015 ( 2015-03-18 )",lossless digital download,Victor,VEAHD-10617
,Japan,"August 5, 2015 ( 2015-08-05 )",LP record,Victor,VIJL-60151~2

?,0,1,2,3,4
Unnamed: 0_level_1,Region,Date,Label,Format,Catalog
,Japan,"July 11, 2001",Pioneer LDC,12cmCD,PICL-1219
,Japan,"July 26, 2006",Geneon Entertainment,12cmCD,GNCL-1082

?,0,1
Unnamed: 0_level_1,Charts,Peak position
,Japan Billboard Top Albums Sales,1
,Japan Oricon daily albums,1
,Japan Oricon weekly albums,1
,Japan Oricon monthly albums,5
,Japan Oricon yearly albums,31

?,0,1
Unnamed: 0_level_1,Chart,Amount
,Oricon physical sales,176000
,RIAJ physical certification,"Gold (100,000+)"

?,0,1,2,3,4
Unnamed: 0_level_1,Region,Date,Format,Distributing Label,Catalogue codes
,Japan,"March 13, 2013 ( 2013-03-13 )","CD , CD/ DVD , CD/ Blu-ray , digital download",Victor Entertainment,"VICL-63999, VIZL-519, VIZL-520"
,Taiwan,"April 12, 2013 ( 2013-04-12 )",CD,Rock Records,GUT2418.4
,South Korea,"May 29, 2013 ( 2013-05-29 )",digital download,J-Box Entertainment,
,Japan,"March 18, 2015 ( 2015-03-18 )",lossless digital download,Victor,VEAHD-10615
,Japan,"August 5, 2015 ( 2015-08-05 )",LP record,Victor,VIJL-60153~4

?,0,1,2,3
Unnamed: 0_level_1,Release date,Version,_Variable,_Value
,25 April 1984,Original release,Oricon Weekly Albums Chart,1
,19 November 2014,30th Anniversary Edition,Oricon Weekly Albums Chart,13

?,0,1,2
Unnamed: 0_level_1,Peak position,_Variable,Charts
,45,2008,Japan Billboard Adult Contemporary Airplay
,9,2008,Japan Billboard Japan Hot 100
,35,2008,Japan Oricon weekly singles

?,0,1,2,3,4
Unnamed: 0_level_1,Region,Date,Format,Distributing Label,Catalog codes
,Japan,"November 12, 2008 ( 2008-11-12 )","ringtone, digital download",BabeStar Label,VEAML-22686
,Japan,"December 10, 2008 ( 2008-12-10 )","CD single , limited edition CD single",BabeStar Label,"VICB-35013, VICB-35014"
,Japan,"December 17, 2008 ( 2008-12-17 )",rental CD single,BabeStar Label,"VICB-35013, VICB-35014"
,South Korea,"April 13, 2009 ( 2009-04-13 )",digital download (EP),J-Box Entertainment,

?,0,1,2
Unnamed: 0_level_1,Peak position,_Variable,Charts
,6,2013,Japan Billboard Adult Contemporary Airplay
,1,2013,Japan Billboard Japan Hot 100
,3,2013,Japan Oricon daily singles
,4,2013,Japan Oricon weekly singles

?,0,1,2,3,4
Unnamed: 0_level_1,Region,Date,Format,Distributing Label,Catalog codes
,Japan,"January 13, 2013 ( 2013-01-13 )",Ringtone,Victor Entertainment,
,Japan,"January 14, 2013 ( 2013-01-14 )",Radio add date,Victor Entertainment,
,Japan,"January 23, 2013 ( 2013-01-23 )","CD single , digital download, rental CD",Victor Entertainment,"VICL-36753, VICL-36754"
,South Korea,"February 20, 2013 ( 2013-02-20 )",Digital download,J-Box Entertainment,

?,0,1,2
Unnamed: 0_level_1,Peak position,_Variable,Charts
,9,2011,Japan Billboard Adult Contemporary Airplay
,7,2011,Japan Billboard Japan Hot 100
,6,2011,Japan Oricon weekly singles

?,0,1,2,3,4
Unnamed: 0_level_1,Region,Date,Format,Distributing Label,Catalog codes
,Japan,"February 21, 2011 ( 2011-02-21 )",radio add date,Victor Entertainment,
,Japan,"February 23, 2011 ( 2011-02-23 )",ringtone,Victor Entertainment,
,Japan,"March 9, 2011 ( 2011-03-09 )",digital download,Victor Entertainment,
,Japan,"March 16, 2011 ( 2011-03-16 )","CD single , limited edition CD Extra single",Victor Entertainment,"VICL-36623, VICL-36624"
,Japan,"April 2, 2011 ( 2011-04-02 )",rental CD single,Victor Entertainment,VICL-36624

?,0,1,2
Unnamed: 0_level_1,Peak position,_Variable,Charts
,6,2012,Japan Billboard Adult Contemporary Airplay
,3,2012,Japan Billboard Japan Hot 100
,6,2012,Japan Oricon weekly singles
,13,2012,Japan RIAJ Digital Track Chart

?,0,1
Unnamed: 0_level_1,Chart,Amount
,Oricon physical sales,26000
,RIAJ PC download certification,"Gold (100,000+)"

?,0,1,2,3,4
Unnamed: 0_level_1,Region,Date,Format,Distributing Label,Catalog codes
,Japan,"April 10, 2012 ( 2012-04-10 )",ringtone,Victor Entertainment,
,Japan,"April 25, 2012 ( 2012-04-25 )",radio add date,Victor Entertainment,
,Japan,"May 15, 2012 ( 2012-05-15 )",digital download,Victor Entertainment,
,Japan,"May 30, 2012 ( 2012-05-30 )",CD single,Victor Entertainment,VICL-36705
,Japan,"June 16, 2012 ( 2012-06-16 )",rental CD single,Victor Entertainment,VICL-36705

?,0,1,2
Unnamed: 0_level_1,Peak position,_Variable,Charts
,1,2012,Japan Billboard Adult Contemporary Airplay
,2,2012,Japan Billboard Japan Hot 100
,5,2012,Japan Oricon weekly singles


In [4]:
clustered = list(takco.TableSet.cluster(reshaped, addcontext = ["pgTitle"], matchers=[]))
clustered = sorted(clustered, key=lambda table: -table.get('numDataRows', 0))

print(f"Processed {len(list(clustered))} tables")
takco.preview(clustered, nrows=5, ntables=25)

Processed 53 tables


?,0,1,2,3
Unnamed: 0_level_1,_pgTitle,Position,_Variable,Chart
,Catch Bull at Four,12,1972,Australian Albums Chart
,Catch Bull at Four,21,1972,French Albums Chart
,Catch Bull at Four,44,1972,Italian Albums Chart
,Hold It Against Me,97,2011,Australia (ARIA)
,Hold It Against Me,93,2011,Belgium (Ultratop Flanders)

?,0,1,2,3
Unnamed: 0_level_1,_pgTitle,Peak position,_Variable,Chart
,Catch Bull at Four,1,1972,Australian Kent Music Report Albums Chart
,Catch Bull at Four,1,1972,Canadian RPM Albums Chart
,Catch Bull at Four,2,1972,Dutch Mega Albums Chart
,Catch Bull at Four,4,1972,French SNEP Albums Chart
,Catch Bull at Four,8,1972,Italian Albums Chart

?,0,1,2,3
Unnamed: 0_level_1,_pgTitle,Peak position,_Variable,Chart (201
,Under the Mistletoe,6,1–18,Australian Albums ( ARIA )
,Under the Mistletoe,22,1–18,Austrian Albums ( Ö3 Austria )
,Under the Mistletoe,11,1–18,Belgian Albums ( Ultratop Flanders)
,Under the Mistletoe,17,1–18,Belgian Albums ( Ultratop Wallonia)
,Under the Mistletoe,1,1–18,Canadian Albums ( Billboard )

?,0,1,2,3
Unnamed: 0_level_1,_pgTitle,Region,Certification,Certified units /sales
,3 Feet High and Rising,United States ( RIAA ),Platinum,"1,000,000 ^"
,Hold It Against Me,Australia ( ARIA ),Platinum,"70,000 ^"
,Hold It Against Me,Mexico ( AMPROFON ),Gold,"30,000 *"
,Hold It Against Me,New Zealand ( RMNZ ),Gold,"7,500 *"
,Hold It Against Me,South Korea (GAON),—,672356

?,0,1,2
Unnamed: 0_level_1,_pgTitle,Source,Rating
,Catch Bull at Four,Allmusic,
,Catch Bull at Four,Christgau's Record Guide,C
,Catch Bull at Four,Rolling Stone,(Favorable)
,3 Feet High and Rising,AllMusic,
,3 Feet High and Rising,Encyclopedia of Popular Music,

?,0,1,2,3,4
Unnamed: 0_level_1,_pgTitle,Unnamed: 2_level_1,Region,Certification,Certified units /sales
,One Hot Minute,,Argentina ( CAPIF ),Gold,"30,000 ^"
,One Hot Minute,,Australia ( ARIA ),2× Platinum,"140,000 ^"
,One Hot Minute,,Austria ( IFPI Austria),Gold,"25,000 *"
,One Hot Minute,,Belgium ( BEA ),Gold,"25,000 *"
,One Hot Minute,,Canada ( Music Canada ),Platinum,"100,000 ^"

?,0,1,2,3
Unnamed: 0_level_1,_pgTitle,Peak position,_Variable,Chart (1
,One Hot Minute,1,995–2001,Australian Albums ( ARIA )
,One Hot Minute,2,995–2001,Austrian Albums ( Ö3 Austria )
,One Hot Minute,5,995–2001,Belgian Albums ( Ultratop Flanders)
,One Hot Minute,3,995–2001,Belgian Albums ( Ultratop Wallonia)
,One Hot Minute,6,995–2001,Canada Top Albums/CDs ( RPM )

?,0,1,2,3
Unnamed: 0_level_1,_pgTitle,Peak position,_Variable,Chart (200
,Hung Up,1,5–06,Australia ( ARIA )
,Hung Up,1,5–06,Austria ( Ö3 Austria Top 40 )
,Hung Up,1,5–06,Belgium ( Ultratop 50 Flanders)
,Hung Up,1,5–06,Belgium ( Ultratop 50 Wallonia)
,Hung Up,5,5–06,Brazil ( Crowley Broadcast Analysis )

?,0,1,2,3
Unnamed: 0_level_1,_pgTitle,Peak position,_Variable,Chart (197
,A Single Man (album),8,8–79,Australian Kent Music Report Albums Chart
,A Single Man (album),12,8–79,Canadian RPM Albums Chart
,A Single Man (album),16,8–79,Dutch Mega Albums Chart
,A Single Man (album),2,8–79,French SNEP Albums Chart
,A Single Man (album),13,8–79,Italian Albums Chart

?,0,1,2,3,4
Unnamed: 0_level_1,_pgTitle,Country,Date,Format,Label
,We Are Never Ever Getting Back Together,Australia,"August 14, 2012",Digital download,Big Machine Records
,We Are Never Ever Getting Back Together,France,"August 19, 2012",Digital download,Big Machine Records
,We Are Never Ever Getting Back Together,Germany,"August 19, 2012",Digital download,Big Machine Records
,We Are Never Ever Getting Back Together,Italy,"August 19, 2012",Digital download,Big Machine Records
,We Are Never Ever Getting Back Together,Japan,"August 19, 2012",Digital download,Big Machine Records

?,0,1,2,3
Unnamed: 0_level_1,_pgTitle,Peak position,_Variable,Chart (2010
,Hold It Against Me,4,2011,Australia ( ARIA )
,Hold It Against Me,16,2011,Austria ( Ö3 Austria Top 40 )
,Hold It Against Me,2,2011,Belgium ( Ultratop 50 Flanders)
,Hold It Against Me,1,2011,Belgium ( Ultratop 50 Wallonia)
,Hold It Against Me,35,2011,Brazil ( Billboard Hot 100)

?,0,1,2,3,4
Unnamed: 0_level_1,_pgTitle,Region,Date,Format,Label
,Hold It Against Me,United States,"January 10, 2011",Radio premiere,Jive Records
,Hold It Against Me,Australia,"January 11, 2011",Digital download,Jive Records
,Hold It Against Me,Canada,"January 11, 2011",Digital download,Jive Records
,Hold It Against Me,Denmark,"January 11, 2011",Digital download,Jive Records
,Hold It Against Me,Finland,"January 11, 2011",Digital download,Jive Records

?,0,1,2,3
Unnamed: 0_level_1,_pgTitle,Peak position,_Variable,Charts
,3 Feet High and Rising,13,1989,UK Albums Chart
,3 Feet High and Rising,24,1989,U.S. Billboard 200
,3 Feet High and Rising,1,1989,U.S. Top R&B/Hip-Hop Albums
,Kikuuiki,4,2010,Japan Billboard Top Albums Sales
,Kikuuiki,2,2010,Japan Oricon daily albums

?,0,1,2,3,4,5
Unnamed: 0_level_1,_pgTitle,Region,Date,Format,Distributing Label,Catalog codes
,Sen to Rei,Japan,"November 12, 2008 ( 2008-11-12 )","ringtone, digital download",BabeStar Label,VEAML-22686
,Sen to Rei,Japan,"December 10, 2008 ( 2008-12-10 )","CD single , limited edition CD single",BabeStar Label,"VICB-35013, VICB-35014"
,Sen to Rei,Japan,"December 17, 2008 ( 2008-12-17 )",rental CD single,BabeStar Label,"VICB-35013, VICB-35014"
,Sen to Rei,South Korea,"April 13, 2009 ( 2009-04-13 )",digital download (EP),J-Box Entertainment,
,Music (Sakanaction song),Japan,"January 13, 2013 ( 2013-01-13 )",Ringtone,Victor Entertainment,

?,0,1,2,3,4
Unnamed: 0_level_1,_pgTitle,Award,Year,Category,Result
,...Baby One More Time (song),APRA Music Awards,2000,Most Performed Foreign Work,Nominated
,...Baby One More Time (song),Billboard Music Awards,1999,Female Singles Artist of the Year,Won
,...Baby One More Time (song),CDDB Awards,1999,Most Played Single on computers,Won
,...Baby One More Time (song),Grammy Awards,2000,Best Female Pop Vocal Performance,Nominated
,...Baby One More Time (song),Guinness World Records,2000,Fastest Most No.1 Singles on UK chart by a Teenage (...),Won

?,0,1,2,3,4
Unnamed: 0_level_1,_pgTitle,Year,Organization,Award/work,Result
,We Are Never Ever Getting Back Together,2012,Guinness World Records,Fastest Selling Single in Digital History,Won
,We Are Never Ever Getting Back Together,2013,Academy of Country Music Awards,Best Music Video,Nominated
,We Are Never Ever Getting Back Together,2013,Billboard Music Awards,Top Streaming Song (Video),Nominated
,We Are Never Ever Getting Back Together,2013,Billboard Music Awards,Top Country Song,Won
,We Are Never Ever Getting Back Together,2013,BMI Awards,Award-Winning Songs,Won

?,0,1,2,3
Unnamed: 0_level_1,_pgTitle,Peak position,_Variable,Chart (2
,Uprising (song),23,009–2010,Australia ( ARIA )
,Uprising (song),29,009–2010,Austria ( Ö3 Austria Top 40 )
,Uprising (song),12,009–2010,Belgium ( Ultratop 50 Flanders)
,Uprising (song),6,009–2010,Belgium ( Ultratop 50 Wallonia)
,Uprising (song),28,009–2010,Canada ( Canadian Hot 100 )

?,0,1,2,3,4
Unnamed: 0_level_1,_pgTitle,Region,Date,Label,Format
,Stronger (What Doesn't Kill You),United States,"January 17, 2012 ( 2012-01-17 )",RCA Records,Mainstream radio
,Stronger (What Doesn't Kill You),United States,"February 3, 2012 ( 2012-02-03 )",RCA Records,Digital download – The Remixes
,Stronger (What Doesn't Kill You),Australia,"February 3, 2012 ( 2012-02-03 )",Sony Music Entertainment,Digital download – EP
,Stronger (What Doesn't Kill You),Canada,"February 3, 2012 ( 2012-02-03 )",Sony Music Entertainment,Digital download – EP
,Stronger (What Doesn't Kill You),Italy,"February 3, 2012 ( 2012-02-03 )",Sony Music Entertainment,Digital download – EP

?,0,1,2
Unnamed: 0_level_1,_pgTitle,Chart,Position
,...Baby One More Time (song),UK Singles (Official Charts Company),33
,...Baby One More Time (song),US Mainstream Top 40 ( Billboard ),25
,Greatest Hits (Elton John album),Australian Kent Music Report Albums Chart,1
,Greatest Hits (Elton John album),Canadian RPM Albums Chart,1
,Greatest Hits (Elton John album),Dutch Mega Albums Chart,16

?,0,1,2,3,4,5
Unnamed: 0_level_1,_pgTitle,Publication,Country,Accolade,Year,Rank
,Led Zeppelin IV,Mojo,UK,"""The 100 Greatest Albums Ever Made""",1996,24
,Led Zeppelin IV,Grammy Awards,US,Grammy Hall of Fame Award,1999,*
,Led Zeppelin IV,The Guitar,US,"""Album of the Millennium""",1999,2
,Led Zeppelin IV,Classic Rock,UK,"""100 Greatest Rock Albums Ever""",2001,1
,Led Zeppelin IV,Rolling Stone,US,"""500 Greatest Albums Ever""",2020,58

?,0,1,2,3,4,5
Unnamed: 0_level_1,_pgTitle,Region,Date,Format,Distributing Label,Catalogue codes
,Kikuuiki,Japan,"March 17, 2010 ( 2010-03-17 )","CD , limited edition CD, digital download",Victor Entertainment,"VICL-63556, VICL-63557"
,Kikuuiki,South Korea,"March 23, 2010 ( 2010-03-23 )",digital download,J-Box Entertainment,
,Kikuuiki,Taiwan,"April 9, 2010 ( 2010-04-09 )",CD,Rock Records,GUT2307
,Kikuuiki,Japan,"March 18, 2015 ( 2015-03-18 )",lossless digital download,Victor,VEAHD-10617
,Kikuuiki,Japan,"August 5, 2015 ( 2015-08-05 )",LP record,Victor,VIJL-60151~2

?,0,1,2
Unnamed: 0_level_1,_pgTitle,Chart,Peak position
,Don't Shoot Me I'm Only the Piano Player,Australian Kent Music Report Albums Chart,1
,Don't Shoot Me I'm Only the Piano Player,Canadian RPM Albums Chart,1
,Don't Shoot Me I'm Only the Piano Player,Danish Albums Chart,4
,Don't Shoot Me I'm Only the Piano Player,Dutch Mega Albums Chart,2
,Don't Shoot Me I'm Only the Piano Player,Finnish Albums Chart,2

?,0,1,2,3,4
Unnamed: 0_level_1,_pgTitle,Year,Chart,Peak position,Unnamed: 5_level_1
,Thick as a Brick,1972,Australian Albums ( Kent Music Report ),1,
,Thick as a Brick,1972,Canadian Albums ( RPM ),1,
,Thick as a Brick,1972,Danish Albums ( Tracklisten ),1,
,Thick as a Brick,1972,,German Albums (Offizielle Top 100),2.0
,Thick as a Brick,2012,,German Albums (Offizielle Top 100) 40th Anniversar (...),53.0

?,0,1,2,3,4
Unnamed: 0_level_1,_pgTitle,Year,Album / Single,_Variable,Peak Chart Positions
,Waiting for the Punchline,1995,Waiting For The Punchline,AUS,51
,Waiting for the Punchline,1995,Waiting For The Punchline,Billboard 200,40
,Waiting for the Punchline,1995,Waiting For The Punchline,UK Albums Chart,-
,Waiting for the Punchline,1995,Waiting For The Punchline,US Mainstream Rock,-
,Waiting for the Punchline,1995,Waiting For The Punchline,US Modern Rock,-

?,0,1,2,3
Unnamed: 0_level_1,_pgTitle,Rating,_Variable,_Value
,Peter Gabriel (1980 album),,Source,AllMusic
,Peter Gabriel (1980 album),,Source,Chicago Sun-Times
,Peter Gabriel (1980 album),B−,Source,Christgau's Record Guide
,Peter Gabriel (1980 album),,Source,Encyclopedia of Popular Music
,Peter Gabriel (1980 album),A−,Source,Entertainment Weekly


In [5]:
linked = list(takco.TableSet.link(
    clustered, 
    lookup_cells = False,
    lookup = takco.link.SQLiteLookup(
        sqlitedb= 'data/wdid_wpname.sqlitedb',
        baseuri = 'http://www.wikidata.org/entity/Q',
        extract = 'http://[^\.]+.wikipedia.org/wiki/([^?]+)',
        fallback = takco.link.MediaWikiAPI(),
    )
))

print(f"Processed {len(list(linked))} tables")
takco.preview(linked, nrows=5, ntables=25)

Processed 53 tables


?,0,1,2,3
Unnamed: 0_level_1,_pgTitle,Position,_Variable,Chart
,Catch Bull at Four,12,1972,Australian Albums Chart
,Catch Bull at Four,21,1972,French Albums Chart
,Catch Bull at Four,44,1972,Italian Albums Chart
,Hold It Against Me,97,2011,Australia (ARIA)
,Hold It Against Me,93,2011,Belgium (Ultratop Flanders)

?,0,1,2,3
Unnamed: 0_level_1,_pgTitle,Peak position,_Variable,Chart
,Catch Bull at Four,1,1972,Australian Kent Music Report Albums Chart
,Catch Bull at Four,1,1972,Canadian RPM Albums Chart
,Catch Bull at Four,2,1972,Dutch Mega Albums Chart
,Catch Bull at Four,4,1972,French SNEP Albums Chart
,Catch Bull at Four,8,1972,Italian Albums Chart

?,0,1,2,3
Unnamed: 0_level_1,_pgTitle,Peak position,_Variable,Chart (201
,Under the Mistletoe,6,1–18,Australian Albums ( ARIA )
,Under the Mistletoe,22,1–18,Austrian Albums ( Ö3 Austria )
,Under the Mistletoe,11,1–18,Belgian Albums ( Ultratop Flanders)
,Under the Mistletoe,17,1–18,Belgian Albums ( Ultratop Wallonia)
,Under the Mistletoe,1,1–18,Canadian Albums ( Billboard )

?,0,1,2,3
Unnamed: 0_level_1,_pgTitle,Region,Certification,Certified units /sales
,3 Feet High and Rising,United States ( RIAA ),Platinum,"1,000,000 ^"
,Hold It Against Me,Australia ( ARIA ),Platinum,"70,000 ^"
,Hold It Against Me,Mexico ( AMPROFON ),Gold,"30,000 *"
,Hold It Against Me,New Zealand ( RMNZ ),Gold,"7,500 *"
,Hold It Against Me,South Korea (GAON),—,672356

?,0,1,2
Unnamed: 0_level_1,_pgTitle,Source,Rating
,Catch Bull at Four,Allmusic,
,Catch Bull at Four,Christgau's Record Guide,C
,Catch Bull at Four,Rolling Stone,(Favorable)
,3 Feet High and Rising,AllMusic,
,3 Feet High and Rising,Encyclopedia of Popular Music,

?,0,1,2,3,4
Unnamed: 0_level_1,_pgTitle,Unnamed: 2_level_1,Region,Certification,Certified units /sales
,One Hot Minute,,Argentina ( CAPIF ),Gold,"30,000 ^"
,One Hot Minute,,Australia ( ARIA ),2× Platinum,"140,000 ^"
,One Hot Minute,,Austria ( IFPI Austria),Gold,"25,000 *"
,One Hot Minute,,Belgium ( BEA ),Gold,"25,000 *"
,One Hot Minute,,Canada ( Music Canada ),Platinum,"100,000 ^"

?,0,1,2,3
Unnamed: 0_level_1,_pgTitle,Peak position,_Variable,Chart (1
,One Hot Minute,1,995–2001,Australian Albums ( ARIA )
,One Hot Minute,2,995–2001,Austrian Albums ( Ö3 Austria )
,One Hot Minute,5,995–2001,Belgian Albums ( Ultratop Flanders)
,One Hot Minute,3,995–2001,Belgian Albums ( Ultratop Wallonia)
,One Hot Minute,6,995–2001,Canada Top Albums/CDs ( RPM )

?,0,1,2,3
Unnamed: 0_level_1,_pgTitle,Peak position,_Variable,Chart (200
,Hung Up,1,5–06,Australia ( ARIA )
,Hung Up,1,5–06,Austria ( Ö3 Austria Top 40 )
,Hung Up,1,5–06,Belgium ( Ultratop 50 Flanders)
,Hung Up,1,5–06,Belgium ( Ultratop 50 Wallonia)
,Hung Up,5,5–06,Brazil ( Crowley Broadcast Analysis )

?,0,1,2,3
Unnamed: 0_level_1,_pgTitle,Peak position,_Variable,Chart (197
,A Single Man (album),8,8–79,Australian Kent Music Report Albums Chart
,A Single Man (album),12,8–79,Canadian RPM Albums Chart
,A Single Man (album),16,8–79,Dutch Mega Albums Chart
,A Single Man (album),2,8–79,French SNEP Albums Chart
,A Single Man (album),13,8–79,Italian Albums Chart

?,0,1,2,3,4
Unnamed: 0_level_1,_pgTitle,Country,Date,Format,Label
,We Are Never Ever Getting Back Together,Australia,"August 14, 2012",Digital download,Big Machine Records
,We Are Never Ever Getting Back Together,France,"August 19, 2012",Digital download,Big Machine Records
,We Are Never Ever Getting Back Together,Germany,"August 19, 2012",Digital download,Big Machine Records
,We Are Never Ever Getting Back Together,Italy,"August 19, 2012",Digital download,Big Machine Records
,We Are Never Ever Getting Back Together,Japan,"August 19, 2012",Digital download,Big Machine Records

?,0,1,2,3
Unnamed: 0_level_1,_pgTitle,Peak position,_Variable,Chart (2010
,Hold It Against Me,4,2011,Australia ( ARIA )
,Hold It Against Me,16,2011,Austria ( Ö3 Austria Top 40 )
,Hold It Against Me,2,2011,Belgium ( Ultratop 50 Flanders)
,Hold It Against Me,1,2011,Belgium ( Ultratop 50 Wallonia)
,Hold It Against Me,35,2011,Brazil ( Billboard Hot 100)

?,0,1,2,3,4
Unnamed: 0_level_1,_pgTitle,Region,Date,Format,Label
,Hold It Against Me,United States,"January 10, 2011",Radio premiere,Jive Records
,Hold It Against Me,Australia,"January 11, 2011",Digital download,Jive Records
,Hold It Against Me,Canada,"January 11, 2011",Digital download,Jive Records
,Hold It Against Me,Denmark,"January 11, 2011",Digital download,Jive Records
,Hold It Against Me,Finland,"January 11, 2011",Digital download,Jive Records

?,0,1,2,3
Unnamed: 0_level_1,_pgTitle,Peak position,_Variable,Charts
,3 Feet High and Rising,13,1989,UK Albums Chart
,3 Feet High and Rising,24,1989,U.S. Billboard 200
,3 Feet High and Rising,1,1989,U.S. Top R&B/Hip-Hop Albums
,Kikuuiki,4,2010,Japan Billboard Top Albums Sales
,Kikuuiki,2,2010,Japan Oricon daily albums

?,0,1,2,3,4,5
Unnamed: 0_level_1,_pgTitle,Region,Date,Format,Distributing Label,Catalog codes
,Sen to Rei,Japan,"November 12, 2008 ( 2008-11-12 )","ringtone, digital download",BabeStar Label,VEAML-22686
,Sen to Rei,Japan,"December 10, 2008 ( 2008-12-10 )","CD single , limited edition CD single",BabeStar Label,"VICB-35013, VICB-35014"
,Sen to Rei,Japan,"December 17, 2008 ( 2008-12-17 )",rental CD single,BabeStar Label,"VICB-35013, VICB-35014"
,Sen to Rei,South Korea,"April 13, 2009 ( 2009-04-13 )",digital download (EP),J-Box Entertainment,
,Music (Sakanaction song),Japan,"January 13, 2013 ( 2013-01-13 )",Ringtone,Victor Entertainment,

?,0,1,2,3,4
Unnamed: 0_level_1,_pgTitle,Award,Year,Category,Result
,...Baby One More Time (song),APRA Music Awards,2000,Most Performed Foreign Work,Nominated
,...Baby One More Time (song),Billboard Music Awards,1999,Female Singles Artist of the Year,Won
,...Baby One More Time (song),CDDB Awards,1999,Most Played Single on computers,Won
,...Baby One More Time (song),Grammy Awards,2000,Best Female Pop Vocal Performance,Nominated
,...Baby One More Time (song),Guinness World Records,2000,Fastest Most No.1 Singles on UK chart by a Teenage (...),Won

?,0,1,2,3,4
Unnamed: 0_level_1,_pgTitle,Year,Organization,Award/work,Result
,We Are Never Ever Getting Back Together,2012,Guinness World Records,Fastest Selling Single in Digital History,Won
,We Are Never Ever Getting Back Together,2013,Academy of Country Music Awards,Best Music Video,Nominated
,We Are Never Ever Getting Back Together,2013,Billboard Music Awards,Top Streaming Song (Video),Nominated
,We Are Never Ever Getting Back Together,2013,Billboard Music Awards,Top Country Song,Won
,We Are Never Ever Getting Back Together,2013,BMI Awards,Award-Winning Songs,Won

?,0,1,2,3
Unnamed: 0_level_1,_pgTitle,Peak position,_Variable,Chart (2
,Uprising (song),23,009–2010,Australia ( ARIA )
,Uprising (song),29,009–2010,Austria ( Ö3 Austria Top 40 )
,Uprising (song),12,009–2010,Belgium ( Ultratop 50 Flanders)
,Uprising (song),6,009–2010,Belgium ( Ultratop 50 Wallonia)
,Uprising (song),28,009–2010,Canada ( Canadian Hot 100 )

?,0,1,2,3,4
Unnamed: 0_level_1,_pgTitle,Region,Date,Label,Format
,Stronger (What Doesn't Kill You),United States,"January 17, 2012 ( 2012-01-17 )",RCA Records,Mainstream radio
,Stronger (What Doesn't Kill You),United States,"February 3, 2012 ( 2012-02-03 )",RCA Records,Digital download – The Remixes
,Stronger (What Doesn't Kill You),Australia,"February 3, 2012 ( 2012-02-03 )",Sony Music Entertainment,Digital download – EP
,Stronger (What Doesn't Kill You),Canada,"February 3, 2012 ( 2012-02-03 )",Sony Music Entertainment,Digital download – EP
,Stronger (What Doesn't Kill You),Italy,"February 3, 2012 ( 2012-02-03 )",Sony Music Entertainment,Digital download – EP

?,0,1,2
Unnamed: 0_level_1,_pgTitle,Chart,Position
,...Baby One More Time (song),UK Singles (Official Charts Company),33
,...Baby One More Time (song),US Mainstream Top 40 ( Billboard ),25
,Greatest Hits (Elton John album),Australian Kent Music Report Albums Chart,1
,Greatest Hits (Elton John album),Canadian RPM Albums Chart,1
,Greatest Hits (Elton John album),Dutch Mega Albums Chart,16

?,0,1,2,3,4,5
Unnamed: 0_level_1,_pgTitle,Publication,Country,Accolade,Year,Rank
,Led Zeppelin IV,Mojo,UK,"""The 100 Greatest Albums Ever Made""",1996,24
,Led Zeppelin IV,Grammy Awards,US,Grammy Hall of Fame Award,1999,*
,Led Zeppelin IV,The Guitar,US,"""Album of the Millennium""",1999,2
,Led Zeppelin IV,Classic Rock,UK,"""100 Greatest Rock Albums Ever""",2001,1
,Led Zeppelin IV,Rolling Stone,US,"""500 Greatest Albums Ever""",2020,58

?,0,1,2,3,4,5
Unnamed: 0_level_1,_pgTitle,Region,Date,Format,Distributing Label,Catalogue codes
,Kikuuiki,Japan,"March 17, 2010 ( 2010-03-17 )","CD , limited edition CD, digital download",Victor Entertainment,"VICL-63556, VICL-63557"
,Kikuuiki,South Korea,"March 23, 2010 ( 2010-03-23 )",digital download,J-Box Entertainment,
,Kikuuiki,Taiwan,"April 9, 2010 ( 2010-04-09 )",CD,Rock Records,GUT2307
,Kikuuiki,Japan,"March 18, 2015 ( 2015-03-18 )",lossless digital download,Victor,VEAHD-10617
,Kikuuiki,Japan,"August 5, 2015 ( 2015-08-05 )",LP record,Victor,VIJL-60151~2

?,0,1,2
Unnamed: 0_level_1,_pgTitle,Chart,Peak position
,Don't Shoot Me I'm Only the Piano Player,Australian Kent Music Report Albums Chart,1
,Don't Shoot Me I'm Only the Piano Player,Canadian RPM Albums Chart,1
,Don't Shoot Me I'm Only the Piano Player,Danish Albums Chart,4
,Don't Shoot Me I'm Only the Piano Player,Dutch Mega Albums Chart,2
,Don't Shoot Me I'm Only the Piano Player,Finnish Albums Chart,2

?,0,1,2,3,4
Unnamed: 0_level_1,_pgTitle,Year,Chart,Peak position,Unnamed: 5_level_1
,Thick as a Brick,1972,Australian Albums ( Kent Music Report ),1,
,Thick as a Brick,1972,Canadian Albums ( RPM ),1,
,Thick as a Brick,1972,Danish Albums ( Tracklisten ),1,
,Thick as a Brick,1972,,German Albums (Offizielle Top 100),2.0
,Thick as a Brick,2012,,German Albums (Offizielle Top 100) 40th Anniversar (...),53.0

?,0,1,2,3,4
Unnamed: 0_level_1,_pgTitle,Year,Album / Single,_Variable,Peak Chart Positions
,Waiting for the Punchline,1995,Waiting For The Punchline,AUS,51
,Waiting for the Punchline,1995,Waiting For The Punchline,Billboard 200,40
,Waiting for the Punchline,1995,Waiting For The Punchline,UK Albums Chart,-
,Waiting for the Punchline,1995,Waiting For The Punchline,US Mainstream Rock,-
,Waiting for the Punchline,1995,Waiting For The Punchline,US Modern Rock,-

?,0,1,2,3
Unnamed: 0_level_1,_pgTitle,Rating,_Variable,_Value
,Peter Gabriel (1980 album),,Source,AllMusic
,Peter Gabriel (1980 album),,Source,Chicago Sun-Times
,Peter Gabriel (1980 album),B−,Source,Christgau's Record Guide
,Peter Gabriel (1980 album),,Source,Encyclopedia of Popular Music
,Peter Gabriel (1980 album),A−,Source,Entertainment Weekly


In [None]:
%%time
searcher = takco.link.RDFSearcher(
    typeProperties = ["http://www.wikidata.org/prop/direct/P31"],
    statementURIprefix = "http://www.wikidata.org/entity/statement/",
    store_classname = 'takco.link.Trident',
    store_kwargs = {'configuration': "/export/scratch1/home/kruit/20200713-prop-skos"}
)
    
typer = takco.link.EntityTyper(
    db = searcher, 
    type_prop = "http://www.wikidata.org/prop/direct/P31",
    cover_threshold = 0.2,
)
typed = list(takco.TableSet.coltypes(linked, typer=typer))
integrated = list(takco.TableSet.integrate(linked, pfd_threshold = 0.95, db=searcher))
print(f"Processed {len(list(integrated))} tables")

integrated = sorted(integrated, key=lambda table: -table.get('numDataRows', 0))
takco.preview(integrated, nrows=5, ntables=25)

In [None]:
triples = takco.TableSet.triples(integrated)
print(f"Extracted {sum(len(table.get('triples')) for table in triples)} triples")