In [1]:
# this notebook converts the CSV to ES mapping

In [2]:
# imports
from datetime import datetime
import hashlib
import json
import sys
import csv
import os
import pandas as pd
import re
import time
from gensim.summarization.summarizer import summarize
from gensim.summarization import keywords

In [3]:
# some long text
# source: https://www.kaggle.com/c/stanford-covid-vaccine
text1 = '''
Winning the fight against the COVID-19 pandemic will require an effective vaccine that can be equitably and widely distributed. Building upon decades of research has allowed scientists to accelerate the search for a vaccine against COVID-19, but every day that goes by without a vaccine has enormous costs for the world nonetheless. We need new, fresh ideas from all corners of the world. Could online gaming and crowdsourcing help solve a worldwide pandemic? Pairing scientific and crowdsourced intelligence could help computational biochemists make measurable progress.
mRNA vaccines have taken the lead as the fastest vaccine candidates for COVID-19, but currently, they face key potential limitations. One of the biggest challenges right now is how to design super stable messenger RNA molecules (mRNA). Conventional vaccines (like your seasonal flu shots) are packaged in disposable syringes and shipped under refrigeration around the world, but that is not currently possible for mRNA vaccines.
Researchers have observed that RNA molecules have the tendency to spontaneously degrade. This is a serious limitation--a single cut can render the mRNA vaccine useless. Currently, little is known on the details of where in the backbone of a given RNA is most prone to being affected. Without this knowledge, current mRNA vaccines against COVID-19 must be prepared and shipped under intense refrigeration, and are unlikely to reach more than a tiny fraction of human beings on the planet unless they can be stabilized.
The Eterna community, led by Professor Rhiju Das, a computational biochemist at Stanford’s School of Medicine, brings together scientists and gamers to solve puzzles and invent medicine. Eterna is an online video game platform that challenges players to solve scientific problems such as mRNA design through puzzles. The solutions are synthesized and experimentally tested at Stanford by researchers to gain new insights about RNA molecules. The Eterna community has previously unlocked new scientific principles, made new diagnostics against deadly diseases, and engaged the world’s most potent intellectual resources for the betterment of the public. The Eterna community has advanced biotechnology through its contribution in over 20 publications, including advances in RNA biotechnology.
In this competition, we are looking to leverage the data science expertise of the Kaggle community to develop models and design rules for RNA degradation. Your model will predict likely degradation rates at each base of an RNA molecule, trained on a subset of an Eterna dataset comprising over 3000 RNA molecules (which span a panoply of sequences and structures) and their degradation rates at each position. We will then score your models on a second generation of RNA sequences that have just been devised by Eterna players for COVID-19 mRNA vaccines. These final test sequences are currently being synthesized and experimentally characterized at Stanford University in parallel to your modeling efforts -- Nature will score your models!
Improving the stability of mRNA vaccines was a problem that was being explored before the pandemic but was expected to take many years to solve. Now, we must solve this deep scientific challenge in months, if not weeks, to accelerate mRNA vaccine research and deliver a refrigerator-stable vaccine against SARS-CoV-2, the virus behind COVID-19. The problem we are trying to solve has eluded academic labs, industry R&D groups, and supercomputers, and so we are turning to you. To help, you can join the team of video game players, scientists, and developers at Eterna to unlock the key in our fight against this devastating pandemic. 
'''

# and a short one
text2 = 'The quick brown fox jumps over the lazy dog'

In [4]:
# function to count words
def word_count(text):
    if isinstance(text, str):
        s = text.split(' ')
        return len(s)
    else:
        return 0

print('words:', word_count(text1))
print('words:', word_count(text2))
print('words:', word_count(None))

words: 564
words: 9
words: 0


In [5]:
# function to count sentences
def sentence_count(text):
    if isinstance(text, str):
        s = text.split('. ')
        return len(s)
    else:
        return 0

print('sentences:', sentence_count(text1))
print('sentences:', sentence_count(text2))
print('sentences:', sentence_count(None))

sentences: 20
sentences: 1
sentences: 0


In [6]:
# extractive summarization

In [7]:
# text summarization 100% -> n%
def nltk_ratio(text, ratio=0.25):
    return summarize(text, ratio=ratio)

sum_nltk_ratio = nltk_ratio(text1, ratio=0.25)
print('words:', word_count(sum_nltk_ratio))
print(sum_nltk_ratio)

words: 139
Eterna is an online video game platform that challenges players to solve scientific problems such as mRNA design through puzzles.
The solutions are synthesized and experimentally tested at Stanford by researchers to gain new insights about RNA molecules.
We will then score your models on a second generation of RNA sequences that have just been devised by Eterna players for COVID-19 mRNA vaccines.
Improving the stability of mRNA vaccines was a problem that was being explored before the pandemic but was expected to take many years to solve.
Now, we must solve this deep scientific challenge in months, if not weeks, to accelerate mRNA vaccine research and deliver a refrigerator-stable vaccine against SARS-CoV-2, the virus behind COVID-19.
To help, you can join the team of video game players, scientists, and developers at Eterna to unlock the key in our fight against this devastating pandemic.


In [8]:
# text summarization 100% -> n words
def nltk_count(text, word_count=100):
    return summarize(text, word_count=word_count)

sum_nltk_count = nltk_count(text1, word_count=100)
print('words:', word_count(sum_nltk_count))
print(sum_nltk_count)

words: 98
Eterna is an online video game platform that challenges players to solve scientific problems such as mRNA design through puzzles.
We will then score your models on a second generation of RNA sequences that have just been devised by Eterna players for COVID-19 mRNA vaccines.
Now, we must solve this deep scientific challenge in months, if not weeks, to accelerate mRNA vaccine research and deliver a refrigerator-stable vaccine against SARS-CoV-2, the virus behind COVID-19.
To help, you can join the team of video game players, scientists, and developers at Eterna to unlock the key in our fight against this devastating pandemic.


In [9]:
# adaptive summarization
# https://www.machinelearningplus.com/nlp/text-summarization-approaches-nlp-example/

In [10]:
# BART
# Importing the model
from transformers import BartForConditionalGeneration, BartTokenizer, BartConfig

In [11]:
'''
# Loading the model and tokenizer for bart-large-cnn
tokenizer=BartTokenizer.from_pretrained('facebook/bart-large-cnn')
model=BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')
#'''

"\n# Loading the model and tokenizer for bart-large-cnn\ntokenizer=BartTokenizer.from_pretrained('facebook/bart-large-cnn')\nmodel=BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')\n#"

In [12]:
'''
# Encoding the inputs and passing them to model.generate()
def bart(text):
    inputs = tokenizer.batch_encode_plus([text],return_tensors='pt')
    summary_ids = model.generate(inputs['input_ids'], early_stopping=True)

    # Decoding and printing the summary
    bart_summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    
    return bart_summary

# long text
start = time.time()
sum_bart_l = bart(text1)
end = time.time()

print('### long text ###')
print('runtime:', end-start)
print('words:', word_count(sum_bart_l))
print('sentences:', sentence_count(sum_bart_l))
print(sum_bart_l)
print('')

# short text
print('### short text ###')
start = time.time()
sum_bart_s = bart(text2)
end = time.time()

print('runtime:', end-start)
print('words:', word_count(sum_bart_s))
print('sentences:', sentence_count(sum_bart_s))
print(sum_bart_s)
#'''

"\n# Encoding the inputs and passing them to model.generate()\ndef bart(text):\n    inputs = tokenizer.batch_encode_plus([text],return_tensors='pt')\n    summary_ids = model.generate(inputs['input_ids'], early_stopping=True)\n\n    # Decoding and printing the summary\n    bart_summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)\n    \n    return bart_summary\n\n# long text\nstart = time.time()\nsum_bart_l = bart(text1)\nend = time.time()\n\nprint('### long text ###')\nprint('runtime:', end-start)\nprint('words:', word_count(sum_bart_l))\nprint('sentences:', sentence_count(sum_bart_l))\nprint(sum_bart_l)\nprint('')\n\n# short text\nprint('### short text ###')\nstart = time.time()\nsum_bart_s = bart(text2)\nend = time.time()\n\nprint('runtime:', end-start)\nprint('words:', word_count(sum_bart_s))\nprint('sentences:', sentence_count(sum_bart_s))\nprint(sum_bart_s)\n#"

In [13]:
# T5
# https://towardsdatascience.com/summarize-reddit-comments-using-t5-bart-gpt-2-xlnet-models-a3e78a5ab944
from transformers import T5Tokenizer, T5ForConditionalGeneration
model = T5ForConditionalGeneration.from_pretrained('t5-base')
tokenizer = T5Tokenizer.from_pretrained('t5-base')

Some weights of the model checkpoint at t5-base were not used when initializing T5ForConditionalGeneration: ['decoder.block.0.layer.1.EncDecAttention.relative_attention_bias.weight']
- This IS expected if you are initializing T5ForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing T5ForConditionalGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [14]:
def t5(text):
    Preprocessed_text = "summarize: " + text
    tokens_input = tokenizer.encode(Preprocessed_text,return_tensors="pt", max_length=512, truncation=True)
    summary_ids = model.generate(tokens_input, min_length=100, max_length=180, length_penalty=4.0)
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary

'''
# long text
start = time.time()
sum_t5_l = t5(text1)
end = time.time()

print('### long text ###')
print('runtime:', end-start)
print('words:', word_count(sum_t5_l))
print('sentences:', sentence_count(sum_t5_l))
print(sum_t5_l)
print('')

# short text
print('### short text ###')
start = time.time()
sum_t5_s = t5(text2)
end = time.time()

print('runtime:', end-start)
print('words:', word_count(sum_t5_s))
print('sentences:', sentence_count(sum_t5_s))
print(sum_t5_s)
#'''

"\n# long text\nstart = time.time()\nsum_t5_l = t5(text1)\nend = time.time()\n\nprint('### long text ###')\nprint('runtime:', end-start)\nprint('words:', word_count(sum_t5_l))\nprint('sentences:', sentence_count(sum_t5_l))\nprint(sum_t5_l)\nprint('')\n\n# short text\nprint('### short text ###')\nstart = time.time()\nsum_t5_s = t5(text2)\nend = time.time()\n\nprint('runtime:', end-start)\nprint('words:', word_count(sum_t5_s))\nprint('sentences:', sentence_count(sum_t5_s))\nprint(sum_t5_s)\n#"

In [15]:
# industry categories

# https://www.census.gov/programs-surveys/aces/information/iccl.html
cat_sic = ['Agriculture','Forestry','Fishing','Mining','Construction','Manufacturing','Transportation','Communications','Electric','Gas','Sanitary','Wholesale Trade','Retail Trade','Finance','Insurance','Real Estate','Services','Public Administration']
# https://www.marketing91.com/19-types-of-business-industries/
cat_19 = ['Aerospace','Transport','Computer','Telecommunication','Agriculture','Construction','Education','Pharmaceutical','Food','Health care','Hospitality','Entertainment','News Media','Energy','Manufacturing','Music','Mining','Worldwide web','Electronics']
# https://simplicable.com/new/industries
cat_simple = ['Advertising','Agriculture','Communication','Construction','Creative','Education','Entertainment','Fashion','Finance','Health care','Information Technology','Manufacturing','Media','Retail','Research','Robotics','Space']

cat = ['Accommodation & Food','Accounting','Agriculture','Banking & Insurance','Biotechnological & Life Sciences','Construction & Engineering','Economics','Education & Research','Emergency & Relief','Finance','Government and Public Works','Healthcare','Justice, Law and Regulations','Manufacturing','Media & Publishing','Miscellaneous','Physics','Real Estate, Rental & Leasing','Utilities','Wholesale & Retail']
subcat = ['Failure','Food','Fraud','General','Genomics','Insurance and Risk','Judicial Applied','Life-sciences','Machine Learning','Maintenance','Management and Operations','Marketing','Material Science','Physical','Policy and Regulatory','Politics','Preventative and Reactive','Quality','Real Estate','Rental & Leasing','Restaurant','Retail','School','Sequencing','Social Policies','Student','Textual Analysis','Tools','Tourism','Trading & Investment','Transportation','Valuation','Water & Pollution','Wholesale']

In [16]:
# zero shot classification
# https://towardsdatascience.com/zero-shot-text-classification-with-hugging-face-7f533ba83cd6
from transformers import pipeline
classifier = pipeline("zero-shot-classification")

Some weights of the model checkpoint at facebook/bart-large-mnli were not used when initializing BartModel: ['model.encoder.version', 'model.decoder.version']
- This IS expected if you are initializing BartModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BartModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at facebook/bart-large-mnli were not used when initializing BartForSequenceClassification: ['model.encoder.version', 'model.decoder.version']
- This IS expected if you are initializing BartForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification m

In [17]:
'''
# test classifictaion with nltk (200 words)

s = nltk_count(text1, word_count=100)

start = time.time()
#res = classifier(nltk_count(text1, word_count=200), cat_19)
res_sic = classifier(s, cat_sic, multi_class=True)
end = time.time()

print('runtime sic:', end-start)
#print(res_sic)
print(res_sic['labels'][0:3])
print(res_sic['scores'][0:3])


start = time.time()
#res = classifier(nltk_count(text1, word_count=200), cat_19)
res_c19 = classifier(s, cat_19, multi_class=True)
end = time.time()

print('runtime c19:', end-start)
#print(res_c19)
print(res_c19['labels'][0:3])
print(res_c19['scores'][0:3])


start = time.time()
#res = classifier(nltk_count(text1, word_count=200), cat_19)
res_simple = classifier(s, cat_simple, multi_class=True)
end = time.time()

print('runtime simple:', end-start)
#print(res_simple)
print(res_simple['labels'][0:3])
print(res_simple['scores'][0:3])


start = time.time()
#res = classifier(nltk_count(text1, word_count=200), cat)
res_simple = classifier(s, cat, multi_class=True)
end = time.time()

print('runtime category:', end-start)
#print(res_simple)
print(res_simple['labels'][0:3])
print(res_simple['scores'][0:3])


start = time.time()
#res = classifier(nltk_count(text1, word_count=200), subcat)
res_simple = classifier(s, subcat, multi_class=True)
end = time.time()

print('runtime subcategory:', end-start)
#print(res_simple)
print(res_simple['labels'][0:3])
print(res_simple['scores'][0:3])
#'''

"\n# test classifictaion with nltk (200 words)\n\ns = nltk_count(text1, word_count=100)\n\nstart = time.time()\n#res = classifier(nltk_count(text1, word_count=200), cat_19)\nres_sic = classifier(s, cat_sic, multi_class=True)\nend = time.time()\n\nprint('runtime sic:', end-start)\n#print(res_sic)\nprint(res_sic['labels'][0:3])\nprint(res_sic['scores'][0:3])\n\n\nstart = time.time()\n#res = classifier(nltk_count(text1, word_count=200), cat_19)\nres_c19 = classifier(s, cat_19, multi_class=True)\nend = time.time()\n\nprint('runtime c19:', end-start)\n#print(res_c19)\nprint(res_c19['labels'][0:3])\nprint(res_c19['scores'][0:3])\n\n\nstart = time.time()\n#res = classifier(nltk_count(text1, word_count=200), cat_19)\nres_simple = classifier(s, cat_simple, multi_class=True)\nend = time.time()\n\nprint('runtime simple:', end-start)\n#print(res_simple)\nprint(res_simple['labels'][0:3])\nprint(res_simple['scores'][0:3])\n\n\nstart = time.time()\n#res = classifier(nltk_count(text1, word_count=200),

In [18]:
'''
# test classifictaion with t5
start = time.time()
#res = classifier(nltk_count(text1, word_count=200), cat_19)
res_sic = classifier(sum_t5_l, cat_sic, multi_class=True)
end = time.time()

print('runtime sic:', end-start)
#print(res_sic)
print(res_sic['labels'][0:3])
print(res_sic['scores'][0:3])


start = time.time()
#res = classifier(nltk_count(text1, word_count=200), cat_19)
res_c19 = classifier(sum_t5_l, cat_19, multi_class=True)
end = time.time()

print('runtime c19:', end-start)
#print(res_c19)
print(res_c19['labels'][0:3])
print(res_c19['scores'][0:3])


start = time.time()
#res = classifier(nltk_count(text1, word_count=200), cat_19)
res_simple = classifier(sum_t5_l, cat_simple, multi_class=True)
end = time.time()

print('runtime simple:', end-start)
#print(res_simple)
print(res_simple['labels'][0:3])
print(res_simple['scores'][0:3])


start = time.time()
#res = classifier(nltk_count(text1, word_count=200), cat)
res_simple = classifier(sum_t5_l, cat, multi_class=True)
end = time.time()

print('runtime category:', end-start)
#print(res_simple)
print(res_simple['labels'][0:3])
print(res_simple['scores'][0:3])


start = time.time()
#res = classifier(nltk_count(text1, word_count=200), subcat)
res_simple = classifier(sum_t5_l, subcat, multi_class=True)
end = time.time()

print('runtime subcategory:', end-start)
#print(res_simple)
print(res_simple['labels'][0:3])
print(res_simple['scores'][0:3])
#'''

"\n# test classifictaion with t5\nstart = time.time()\n#res = classifier(nltk_count(text1, word_count=200), cat_19)\nres_sic = classifier(sum_t5_l, cat_sic, multi_class=True)\nend = time.time()\n\nprint('runtime sic:', end-start)\n#print(res_sic)\nprint(res_sic['labels'][0:3])\nprint(res_sic['scores'][0:3])\n\n\nstart = time.time()\n#res = classifier(nltk_count(text1, word_count=200), cat_19)\nres_c19 = classifier(sum_t5_l, cat_19, multi_class=True)\nend = time.time()\n\nprint('runtime c19:', end-start)\n#print(res_c19)\nprint(res_c19['labels'][0:3])\nprint(res_c19['scores'][0:3])\n\n\nstart = time.time()\n#res = classifier(nltk_count(text1, word_count=200), cat_19)\nres_simple = classifier(sum_t5_l, cat_simple, multi_class=True)\nend = time.time()\n\nprint('runtime simple:', end-start)\n#print(res_simple)\nprint(res_simple['labels'][0:3])\nprint(res_simple['scores'][0:3])\n\n\nstart = time.time()\n#res = classifier(nltk_count(text1, word_count=200), cat)\nres_simple = classifier(sum_t

In [19]:
'''
# test classifictaion with bart

start = time.time()
#res = classifier(nltk_count(text1, word_count=200), cat_19)
res_sic = classifier(sum_bart_l, cat_sic, multi_class=True)
end = time.time()

print('runtime sic:', end-start)
#print(res_sic)
print(res_sic['labels'][0:3])
print(res_sic['scores'][0:3])


start = time.time()
#res = classifier(nltk_count(text1, word_count=200), cat_19)
res_c19 = classifier(sum_bart_l, cat_19, multi_class=True)
end = time.time()

print('runtime c19:', end-start)
#print(res_c19)
print(res_c19['labels'][0:3])
print(res_c19['scores'][0:3])


start = time.time()
#res = classifier(nltk_count(text1, word_count=200), cat_19)
res_simple = classifier(sum_bart_l, cat_simple, multi_class=True)
end = time.time()

print('runtime simple:', end-start)
#print(res_simple)
print(res_simple['labels'][0:3])
print(res_simple['scores'][0:3])


start = time.time()
#res = classifier(nltk_count(text1, word_count=200), cat)
res_simple = classifier(sum_bart_l, cat, multi_class=True)
end = time.time()

print('runtime category:', end-start)
#print(res_simple)
print(res_simple['labels'][0:3])
print(res_simple['scores'][0:3])


start = time.time()
#res = classifier(nltk_count(text1, word_count=200), subcat)
res_simple = classifier(sum_bart_l, subcat, multi_class=True)
end = time.time()

print('runtime subcategory:', end-start)
#print(res_simple)
print(res_simple['labels'][0:3])
print(res_simple['scores'][0:3])
#'''

"\n# test classifictaion with bart\n\nstart = time.time()\n#res = classifier(nltk_count(text1, word_count=200), cat_19)\nres_sic = classifier(sum_bart_l, cat_sic, multi_class=True)\nend = time.time()\n\nprint('runtime sic:', end-start)\n#print(res_sic)\nprint(res_sic['labels'][0:3])\nprint(res_sic['scores'][0:3])\n\n\nstart = time.time()\n#res = classifier(nltk_count(text1, word_count=200), cat_19)\nres_c19 = classifier(sum_bart_l, cat_19, multi_class=True)\nend = time.time()\n\nprint('runtime c19:', end-start)\n#print(res_c19)\nprint(res_c19['labels'][0:3])\nprint(res_c19['scores'][0:3])\n\n\nstart = time.time()\n#res = classifier(nltk_count(text1, word_count=200), cat_19)\nres_simple = classifier(sum_bart_l, cat_simple, multi_class=True)\nend = time.time()\n\nprint('runtime simple:', end-start)\n#print(res_simple)\nprint(res_simple['labels'][0:3])\nprint(res_simple['scores'][0:3])\n\n\nstart = time.time()\n#res = classifier(nltk_count(text1, word_count=200), cat)\nres_simple = classi

In [20]:
# category function
def categorize(text, categories, first=True, treshold=0):
    start = time.time()
    res = classifier(text, categories, multi_class=True)
    #print(res)
    end = time.time()
    dur = round(end-start, 3)
    if first == True:
        return {
            'category': res['labels'][0],
            'score': res['scores'][0],
            'runtime': dur,
        } if res['scores'][0] >= treshold else {
            'category': None,
            'score': None,
            'runtime': dur,
        }
    else:
        ret = dict(zip(res['labels'], res['scores']))
        ret = {key: val for key, val in filter(lambda sub: sub[1] >= treshold, ret.items())} 
        return ret
        

#print(categorize(nltk_count(text1), cat, first=False, treshold=0.5))
#print(categorize(nltk_count(text1), cat, first=True, treshold=0.9))

In [21]:
# language detection
# https://towardsdatascience.com/how-to-detect-and-translate-languages-for-nlp-project-dfd52af0c3b5
from langdetect import detect, detect_langs, DetectorFactory

language_codes = {'af': 'afrikaans', 'sq': 'albanian', 'am': 'amharic', 'ar': 'arabic', 'hy': 'armenian', 'az': 'azerbaijani', 'eu': 'basque', 'be': 'belarusian', 'bn': 'bengali', 'bs': 'bosnian', 'bg': 'bulgarian', 'ca': 'catalan', 'ceb': 'cebuano', 'ny': 'chichewa', 'zh-cn': 'chinese (simplified)', 'zh-tw': 'chinese (traditional)', 'co': 'corsican', 'hr': 'croatian', 'cs': 'czech', 'da': 'danish', 'nl': 'dutch', 'en': 'english', 'eo': 'esperanto', 'et': 'estonian', 'tl': 'filipino', 'fi': 'finnish', 'fr': 'french', 'fy': 'frisian', 'gl': 'galician', 'ka': 'georgian', 'de': 'german', 'el': 'greek', 'gu': 'gujarati', 'ht': 'haitian creole', 'ha': 'hausa', 'haw': 'hawaiian', 'iw': 'hebrew', 'hi': 'hindi', 'hmn': 'hmong', 'hu': 'hungarian', 'is': 'icelandic', 'ig': 'igbo', 'id': 'indonesian', 'ga': 'irish', 'it': 'italian', 'ja': 'japanese', 'jw': 'javanese', 'kn': 'kannada', 'kk': 'kazakh', 'km': 'khmer', 'ko': 'korean', 'ku': 'kurdish (kurmanji)', 'ky': 'kyrgyz', 'lo': 'lao', 'la': 'latin', 'lv': 'latvian', 'lt': 'lithuanian', 'lb': 'luxembourgish', 'mk': 'macedonian', 'mg': 'malagasy', 'ms': 'malay', 'ml': 'malayalam', 'mt': 'maltese', 'mi': 'maori', 'mr': 'marathi', 'mn': 'mongolian', 'my': 'myanmar (burmese)', 'ne': 'nepali', 'no': 'norwegian', 'ps': 'pashto', 'fa': 'persian', 'pl': 'polish', 'pt': 'portuguese', 'pa': 'punjabi', 'ro': 'romanian', 'ru': 'russian', 'sm': 'samoan', 'gd': 'scots gaelic', 'sr': 'serbian', 'st': 'sesotho', 'sn': 'shona', 'sd': 'sindhi', 'si': 'sinhala', 'sk': 'slovak', 'sl': 'slovenian', 'so': 'somali', 'es': 'spanish', 'su': 'sundanese', 'sw': 'swahili', 'sv': 'swedish', 'tg': 'tajik', 'ta': 'tamil', 'te': 'telugu', 'th': 'thai', 'tr': 'turkish', 'uk': 'ukrainian', 'ur': 'urdu', 'uz': 'uzbek', 'vi': 'vietnamese', 'cy': 'welsh', 'xh': 'xhosa', 'yi': 'yiddish', 'yo': 'yoruba', 'zu': 'zulu', 'fil': 'Filipino', 'he': 'Hebrew'}

def lingo(text, simple=True):
    DetectorFactory.seed = 0
    
    if simple == True:
        return detect(text) #language_codes[detect(text)]
    else:
        l = str(detect_langs(text)[0]).split(':')
        l = {
            'code': l[0],
            'language': language_codes[ l[0] ],
            'probability': l[1],
        }
        return l

sentence = "Tanzania ni nchi inayoongoza kwa utalii barani afrika"
sentence2 = "Heute schneit es."

print(lingo(sentence, simple=False))
print(lingo(sentence2))
print(lingo(text1))
print(lingo(text2))

{'code': 'sw', 'language': 'swahili', 'probability': '0.9999971210408874'}
de
en
en


In [22]:
# helper functions

In [23]:
# function to rebuild list from string
# that happens when it is stored in CSV without json-encode the data
def str_to_list(s):
    s = s.replace("'", "").replace(' ,', ',').replace(
        '[', '').replace(']', '').split(',')
    s = [i.replace('"','').strip() for i in s if i]
    return s

In [24]:
# helper function to create folder create_folder
def create_folder(path):
    if not os.path.exists(os.path.dirname(path)):
        try:
            os.makedirs(os.path.dirname(path))
            print(path + ' created')
        except OSError as exc: # Guard against race condition
            if exc.errno != errno.EEXIST:
                raise

In [25]:
# generic store data to file function
def store_data(data, file, mode='w', toJson=False):
    if toJson:
        data = json.dumps(data)
    with open(file, mode, encoding='utf-8') as fp:
        result = fp.write(data)
        return result
    
# generic load data from file function
def load_data(file, fromJson=False):
    if os.path.isfile(file):
        with open(file, 'r', encoding='utf-8', errors="ignore") as fp:
            data = fp.read()
            if fromJson:
                data = json.loads(data)
            return data
    else:
        return 'file not found'

# test text
#print(store_data('Hello', '../data/repositories/mlart/test.txt'))
#print(load_data('../data/repositories/mlart/test.txt'))

# test json
#print(store_data({'msg':'Hello World'}, '../data/repositories/mlart/test.json', toJson=True))
#print(load_data('../data/repositories/mlart/test.json', fromJson=True))

#store_data(result[0]['html'], '../data/repositories/kaggle/notebook.html')
#store_data(result[0]['iframe'], '../data/repositories/kaggle/kernel.html')

In [37]:
# remove special characters
def clean_text(text):
    # Ref: https://gist.github.com/Alex-Just/e86110836f3f93fe7932290526529cd1#gistcomment-3208085
    # Ref: https://en.wikipedia.org/wiki/Unicode_block
    EMOJI_PATTERN = re.compile(
        "(["
        "\U0001F1E0-\U0001F1FF"  # flags (iOS)
        "\U0001F300-\U0001F5FF"  # symbols & pictographs
        "\U0001F600-\U0001F64F"  # emoticons
        "\U0001F680-\U0001F6FF"  # transport & map symbols
        "\U0001F700-\U0001F77F"  # alchemical symbols
        "\U0001F780-\U0001F7FF"  # Geometric Shapes Extended
        "\U0001F800-\U0001F8FF"  # Supplemental Arrows-C
        "\U0001F900-\U0001F9FF"  # Supplemental Symbols and Pictographs
        "\U0001FA00-\U0001FA6F"  # Chess Symbols
        "\U0001FA70-\U0001FAFF"  # Symbols and Pictographs Extended-A
        "\U00002702-\U000027B0"  # Dingbats
        "])"
    )
    text = re.sub(EMOJI_PATTERN, '', text)
    
    # additional cleanup
    text = text.replace('•','').replace('\n',' ')
    
    return text

In [42]:
tag_filter = {
    # '3D',
    '3D Ken Burns Effect': 'Ken Burns 3D',
    # '3D Photo Inpainting',
    'AI': None,
    # 'ANN',
    # 'Ableton Live',
    'Activations': None,
    'Aeriolod': None,
    # 'AlexNet',
    'Analytics  Competition': None,
    # 'Animating Landscape',
    # 'Anomaly Detection',
    'ArtBreeder': 'Artbreeder',
    # 'Artbreeder',
    # 'AttnGAN',
    # 'AutoML',
    # 'B3D',
    # 'BASNet',
    # 'Background Removal',
    # 'Bayesian',
    # 'BigGAN',
    # 'BodyPix',
    # 'Boltzmann Machine',
    # 'CAD',
    # 'CMA-ES',
    # 'CMT',
    # 'CNN',
    # 'CPPN',
    # 'CV',
    'Camera': None,
    # 'Chatbot',
    # 'Classification',
    'Classifier': 'Classification',
    # 'Clustering',
    # 'Colorization',
    'Contentless': None,
    # 'Corpus-based synthesis',
    # 'CycleGAN',
    # 'DCGAN',
    # 'DDSP',
    # 'DL',
    # 'DLIB',
    # 'DeOldify',
    # 'Decision Tree',
    'Deep Fakes': 'DeepFake',
    # 'Deep Painterly Harmonization',
    # 'DeepDream',
    # 'DeepFake',
    # 'DeepFlow',
    # 'DenseCap',
    'Depth Map': None,
    # 'Detectron',
    'Detectron2': 'Detectron',
    'Device': None,
    'Discriminator': None,
    'Document Summarization': 'Summarization',
    # 'ESR-GAN',
    # 'Edge Detection',
    # 'Expression Detection',
    'Face Alignment': 'Face Detection',
    # 'Face Detection',
    # 'Face Recognition',
    'Face Tracking': 'Face Detection',
    'Face markers': 'Face Detection',
    'Face recognition': 'Face Recognition',
    'Facial Detection': 'Face Detection',
    'Facial Recognition': 'Face Recognition',
    # 'FastPhotoStyle',
    # 'Feature Mixing',
    'Feature vectors': None,
    'Featured Code Competition': None,
    'Featured Simulation Competition': None,
    'Featured prediction Competition': None,
    # 'Federated Learning',
    # 'Few Shot Animation',
    'First Order Motion': None,
    # 'GAN',
    # 'GBM',
    # 'GPT',
    'GPT-2': 'GPT',
    # 'GRU',
    'Game of Life': None,
    # 'GauGAN',
    # 'Gaussian Mixture Model',
    # 'Genetic Algorithm',
    'Getting Started prediction Competition': None,
    # 'Gradient Ascent',
    # 'Gradient Boosting',
    # 'Gradient Smoothing',
    # 'Grannma Magnet',
    'Hardware': None,
    'Height map': None,
    'Heightfield': None,
    'Houdini': None,
    # 'Image Captioning',
    # 'Image Segmentation',
    # 'ImageJ',
    # 'ImageNet',
    # 'Inception',
    # 'Inpainting',
    'Interpolation': None,
    # 'IoT',
    'K-Means': 'K-means',
    # 'K-means',
    # 'KNN',
    # 'Ken Burns 3D',
    # 'Kolmogorov complexity',
    # 'LSTM',
    # 'Laplacian Pyramid',
    'Lenticular': 'Lenticular Printing',
    'Lenticular Print': 'Lenticular Printing',
    # 'Linear Regression',
    # 'Logistic Regression',
    'ML': None,
    # 'Machine Translation',
    'Machine translation': 'Machine Translation',
    'Magenta': None,
    # 'Markov Chain',
    'Markov Chains': 'Markov Chain',
    'Memory Mosaic': None,
    'Microphone': None,
    'Mixture Density Networks': 'MDN',
    'Multi-Domain Multi-Modality I2I translation': 'Image2Image',
    'Multi-Style Transfer': 'Style Transfer',
    # 'Music Transformer',
    # 'N-gram',
    # 'NER',
    # 'NLG',
    # 'NLP',
    # 'NLU',
    'NN': None,
    # 'NNS',
    # 'NSynth',
    'NSynth Super': 'NSynth',
    # 'Naive Bayes',
    'Nerual CA': 'Neural Cellular Automata',
    # 'Neural Cellular Automata',
    # 'Object Detection',
    # 'Occams razor',
    'Open Pose': 'OpenPose',
    # 'OpenCV',
    # 'OpenPose',
    # 'Optical Flow',
    'Optical flow': 'Optical Flow',
    'Perlin Noise': None,
    # 'Photogrammetry',
    'Photoshop': None,
    'Pix2Pix': 'Image2Image',
    'Pix2pix': 'Image2Image',
    'Pixel2style2pixel': 'Image2Image',
    'Playground Code Competition': None,
    'Playground prediction Competition': None,
    # 'Point Cloud',
    # 'PoseNet',
    # 'ProGAN',
    'Progressively Grown GAN': 'ProGAN',
    # 'Projective Non-negative Matrix Factorization',
    # 'Quantum Computer',
    # 'QuickDraw',
    # 'RL',
    # 'RNN',
    # 'Random Forest',
    # 'Raymarching',
    # 'ReLu',
    # 'Recommender',
    'Recruitment prediction Competition': None,
    'Rectifier': 'ReLu',
    # 'Regression',
    'Reinforcement Learning': 'RL',
    # 'ResNet',
    'Research Code Competition': None,
    'Research prediction Competition': None,
    'Resnet': 'ResNet',
    # 'SIFT',
    # 'SNGAN',
    # 'SOM',
    # 'Self-attention',
    'Semantic search': 'Semantic Search',
    # 'Sentiment Analysis',
    # 'Simplex Volume Maximization',
    # 'SinGAN',
    # 'SketchRNN',
    # 'Sparse Transformer',
    # 'Speech Recognition',
    # 'Speech to text',
    # 'Style Transfer',
    # 'StyleGAN',
    'StyleGAN2': 'StyleGAN',
    'StyleTransfer': 'Style Transfer',
    # 'Super Slo Mo',
    # 'Super-Resolution',
    'Super-resolution': 'Super-Resolution',
    'Superresolution': 'Super-Resolution',
    # 'Supervised Learning',
    'Support Vector Machines': 'SVM',
    'TensorFlow.js': None,
    'Tensorflow.js': None,
    'Text Classification': 'Classification',
    'Text To Speech': 'Text To Speech',
    'Text classification': 'Classification',
    'Text to Animation of Virtual Characters': 'Text to Animation',
    # 'Texture synthesis',
    # 'Transformer',
    # 'Translation',
    # 'U-Net',
    'U-net': 'U-Net',
    # 'UMAP',
    'Unsupervised learning': 'Unsupervised Learning',
    # 'VAE',
    # 'VGG',
    'VQ-VAE': 'VAE',
    # 'VR',
    'Video StyleTransfer': 'Style Transfer',
    # 'Voice Detection',
    'Voice detection': 'Voice Detection',
    # 'Watson Beat',
    # 'Wav2Lip',
    # 'WaveGAN',
    # 'Wavenet',
    'Weights': None,
    # 'Word2Vec',
    'advanced': None,
    'animals': None,
    'arts and entertainment': 'Arts and Entertainment',
    'astronomy': 'Astronomy',
    'audio data': None,
    'automobiles and vehicles': 'Automotive',
    'banking': 'Banking',
    'basketball': 'Sports',
    'bayesian statistics': None,
    'beginner': None,
    'bigquery': 'BigQuery',
    'binary classification': 'Classification',
    'biology': 'Biology',
    # 'cDCGAN',
    'california': None,
    'categorical data': None,
    'china': None,
    'classification': 'Classification',
    'clustering': 'CLustering',
    'cnn': 'CNN',
    'computer science': None,
    'computer vision': 'CV',
    'covid19': None,
    'cuml-UMAP': 'UMAP',
    'dailychallenge': None,
    'data analytics': None,
    'data cleaning': None,
    'data visualization': None,
    'decision tree': 'Decision Tree',
    'deep learning': 'DL',
    'deepflow': 'DeepFlow',
    'dimensionality reduction': 'Dimensionality Reduction',
    'diseases': 'Diseases',
    'e-commerce services': 'E-Commerce',
    'earth and nature': 'Earth and Nature',
    'employment': 'Employment',
    'ensembling': 'Ensembling',
    'environment': 'Environment',
    'exploratory data analysis': 'Exploratory Data Analysis',
    'feature engineering': 'Feature Engineering',
    'finance': 'Finance',
    'forestry': 'Forestry',
    'games': 'Games',
    'gan': 'GAN',
    'genetics': 'Genetics',
    'geospatial analysis': 'Geospatial Analysis',
    'gpu': None,
    'gradient boosting': 'Gradient Boosting',
    'health': 'Healthcare',
    'healthcare': 'Healthcare',
    'image data': None,
    'india': None,
    'intermediate': None,
    'jobs and career': None,
    'k-means': 'K-means',
    'keras': 'Keras',
    'languages': None,
    'learn': None,
    'lightgbm': 'Gradient Boosting',
    'linear regression': 'Linear Regression',
    'linguistics': None,
    'logistic regression': 'Logistic Regression',
    'lstm': 'LSTM',
    'medicine': 'Healthcare',
    'microcontroller': None,
    'model comparison': 'Model Comparison',
    'model explainability': 'Model Explainability',
    'multiclass classification': 'Classification',
    'multilabel classification': 'Classification',
    'naive bayes': 'Naive Bayes',
    'neural networks': None,
    'nlp': 'NLP',
    'ofxSelfOrganizingMap': 'Self-organizing map',
    'openFrameworks': None,
    'optimization': None,
    'outlier analysis': 'Outlier Analysis',
    'pca': None,
    'physics': 'Physics',
    'pix2code': 'Pix2Code',
    'pix2pix': 'Image2Image',
    'plants': 'Plants',
    'pollution': 'Pollution',
    'puzzles': None,
    'python': None,
    'pytorch': None,
    # 'rGMIR',
    'random forest': 'Random Forest',
    'recommender systems': 'Recommender',
    'regression': 'Regression',
    'reinforcement learning': 'RL',
    'research': None,
    'rnn': 'RNN',
    'robotics': 'Robotics',
    'sampling': None,
    'signal processing': None,
    'simulations': None,
    'spaCy': 'NLP',
    'sports': 'Sports',
    'survey analysis': None,
    'svm': 'SVM',
    # 't-SNE',
    'tabular data': None,
    'tensorflow': None,
    'text data': None,
    'text mining': None,
    'time series analysis': 'Time Series Analysis',
    'tpu': None,
    'transfer learning': 'Transfer Learning',
    'utility script': None,
    'video games': 'Games',
    'xgboost': 'Xgboost',
}

def tag_equalizer(tags):
    tags = [tag_filter.get(x, x) for x in tags]
    tags = list(filter(None, tags))
    return tags

print(tag_equalizer(['tpu', 'rnn']))

['RNN']


In [27]:
# mapper to convert CSV to the mapping of Elasticsearch index
def mapper(row, style):
    '''
    mapper to adopt csv to db-schema

    "title"
    "summarization"
    "words"
    "sum_words"
    "link"
    "source"
    "category"
    "category_score"
    "subcategory"
    "subcategory_score"
    "tags"
    "kind"
    "ml_libs"
    "host"
    "license"
    "programming_language"
    "ml_score"
    "engagement_score"
    "date_project"
    "date_scraped"
    '''

    # kaggle competition mapping
    if style == 'kaggle_competition':
        return {
            'title': row['title'],
            'description': row['subtitle'] + row['description'],
            'link': row['link'],
            # 'category': '',
            # 'category_score': 0,
            # 'subcategory': '',
            # 'subcategory_score': 0,
            'tags': list(set(str_to_list(row['tags']) + str_to_list(row['type']))),
            'kind': ['Project', '(Competition)', '(Dataset)'],
            # 'ml_libs': str_to_list(row['ml_libs']),
            'host': 'www.kaggle.com',
            # 'license': row['license'],
            # 'programming_language': row['type'],
            # 'ml_score': 0,
            'engagement_score': row['teams_score'],
            'date_project': datetime.strptime(row['date_closed'], "%Y-%m-%d %H:%M:%S") if 'date_closed' in row else '',
            # 'date_scraped': datetime.strptime(row['scraped_at'], "%Y-%m-%d %H:%M:%S"),
            # 'ml_terms': row['ml_terms'],
            # 'score_raw': json.dumps({'views': row['views'], 'votes': row['votes'], 'score_private': row['score_private'], 'score_public': row['score_public']}),
        }
    
    # kaggle dataset mapping
    if style == 'kaggle_dataset':
        return {
            'title': row['title'],
            'description': row['subtitle'] + row['description'],
            'link': row['link'],
            # 'category': '',
            # 'category_score': 0,
            # 'subcategory': '',
            # 'subcategory_score': 0,
            'tags': list(set(str_to_list(row['tags']) + str_to_list(row['type']))),
            'kind': ['Project', '(Dataset)'],
            # 'ml_libs': str_to_list(row['ml_libs']),
            'host': 'www.kaggle.com',
            # 'license': row['license'],
            # 'programming_language': row['type'],
            # 'ml_score': 0,
            'engagement_score': row['teams_score'],
            'date_project': datetime.strptime(row['date_closed'], "%Y-%m-%d %H:%M:%S") if 'date_closed' in row else '',
            # 'date_scraped': datetime.strptime(row['scraped_at'], "%Y-%m-%d %H:%M:%S"),
            # 'ml_terms': row['ml_terms'],
            # 'score_raw': json.dumps({'views': row['views'], 'votes': row['votes'], 'score_private': row['score_private'], 'score_public': row['score_public']}),
        }
    
    # kaggle notebook mapping
    if style == 'kaggle_notebook':
        return {
            'title': row['title'],
            'description': row['description'],
            'link': row['link'],
            # 'category': '',
            # 'category_score': 0,
            # 'subcategory': '',
            # 'subcategory_score': 0,
            'tags': list(set(str_to_list(row['tags']) + str_to_list(row['tags']))),
            'kind': ['Project', '(Notebook)'],
            'ml_libs': str_to_list(row['ml_libs']),
            'host': 'www.kaggle.com',
            'license': row['license'],
            'programming_language': row['type'],
            'ml_score': row['ml_detected'],
            'engagement_score': row['score_views'] if 'score_views' in row else None,
            'date_project': datetime.strptime(row['date'], "%Y-%m-%d %H:%M:%S") if row['date'] != '' else None,
            'date_scraped': datetime.strptime(row['scraped_at'], "%Y-%m-%d %H:%M:%S") if row['scraped_at'] != '' else None,
            # 'ml_terms': row['ml_terms'],
            # 'score_raw': json.dumps({'views': row['views'], 'votes': row['votes'], 'score_private': row['score_private'], 'score_public': row['score_public']}),
        }

    # github mapping
    if style == 'github':
        title = row['name'] if row['name'] != '' else row['title']
        title = title.replace('-',' ').replace('_',' ').strip()
        cat_score = 1 if row['industry'] != '' else 0
        subcat_score = 1 if row['type'] != '' else 0
        #tags = row['ml_tags'] if len(row['ml_tags']) > 0 else ''
        return {
            'title': title,
            'description': row['description2'],
            'link': row['link'],
            'category': row['industry'],
            'category_score': cat_score,
            'subcategory': row['type'],
            'subcategory_score': subcat_score,
            'tags': str_to_list(row['ml_tags']),
            'kind': 'Project',
            'ml_libs': str_to_list(row['ml_libs']),
            'host': 'www.github.com',
            'license': row['license'],
            'programming_language': row['language_primary'],
            'ml_score': row['ml_detected'],
            'engagement_score': row['stars_score'],
            'date_project': datetime.strptime(row['pushed_at'], "%Y-%m-%d %H:%M:%S"),
            'date_scraped': datetime.strptime(row['scraped_at'], "%Y-%m-%d %H:%M:%S"),
            # 'ml_terms': row['keywords'],
            # 'score_raw': json.dumps({'stars': row['stars'], 'contributors': row['contributors']}),
        }

    # mlart mapping
    if style == 'mlart':
        title = row['Title'] if row['Title'] != '' else row['title']
        cat_score = 1 if row['Theme'] != '' else 0
        subcat_score = 1 if row['Medium'] != '' else 0
        return {
            'title': title,
            'description': row['subtitle'],
            'link': row['url'],
            'category': str_to_list(row['Theme']),
            'category_score': cat_score,
            'subcategory': str_to_list(row['Medium']),
            'subcategory_score': subcat_score,
            'tags': str_to_list(row['Technology']),
            'kind': 'Showcase',
            # 'ml_libs': [],
            'host': 'mlart.co',
            # 'license': '',
            # 'programming_language': '',
            # 'ml_score': row['ml_detected'],
            # 'engagement_score': 0,
            'date_project': datetime.strptime(row['Date'], "%Y-%m-%d"),
            'date_scraped': datetime.strptime(row['scraped_at'], "%Y-%m-%d %H:%M:%S"),
            # 'score_raw': json.dumps({'days_since_featured': row['Days Since Featured']}),
        }

    # thecleverprogrammer
    if style == 'tcp':
        return {
            'title': row['title'],
            'description': row['description'],
            'link': row['link'],
            # 'category': '',
            # 'category_score': 0,
            # 'subcategory': '',
            # 'subcategory_score': 0,
            'tags': str_to_list(row['ml_tags']),
            'kind': 'Project',
            'ml_libs': str_to_list(row['ml_libs']),
            'host': 'thecleverprogrammer.com',
            # 'license': '',
            'programming_language': 'Python',
            'ml_score': row['ml_score'],
            # 'engagement_score': 0,
            'date_project': datetime.strptime(row['date'], "%Y-%m-%d %H:%M:%S"),
            'date_scraped': datetime.strptime('2020-12-20', "%Y-%m-%d"),
            # 'score_raw': json.dumps({'days_since_featured': row['Days Since Featured']}),
        }

    return None

In [28]:
# test gpu usage
import torch
torch.cuda.is_available()

True

In [29]:
# summarization loop

In [43]:
# loop to transform data row-wise
def transform_loop(csv_in, csv_format, subfolder, quit=0, overwrite=False, inplace=True):
    
    with open(csv_in, encoding='utf-8') as csvfile:
        
        # let's store converted csv to temp-folder for analysis
        csv_out = '../data/database/csv/'
        json_out = '../data/database/json/'
        json_out_item = '../data/database/json/'+subfolder
        create_folder(json_out_item)
        df = pd.DataFrame()

        # readCSV = csv.reader(csvfile, delimiter=';')
        readCSV = csv.DictReader(csvfile, delimiter=';')
        # next(readCSV, None)  # skip the headers
        
        i = j = 0
        out = []
        
        for row in readCSV:
            # print(row)
            row = mapper(row, csv_format)
            
            # check if file already exists
            link = row['link']
            md5 = hashlib.md5(link.encode("utf-8")).hexdigest()
            
            json_fp = json_out_item + md5 + '.json'
            if not os.path.isfile(json_fp) or overwrite == True:
                if overwrite == True:
                    old = load_data(json_fp, fromJson=True)
                else:
                    old = None
                
                print(i, row['link'])
                item_start = time.time()

                # clean title & description
                row['title'] = clean_text(row['title'])
                text = row['description'] = clean_text(row['description'])
                words = row['words'] = word_count(text)
                sentences = row['sentences'] = sentence_count(text)

                # create summarization
                if words > 200 and sentences > 1:
                    print('summarize')
                    
                    # nltk
                    if old == None or not 'sum_nltk' in old:
                        start = time.time()
                        row['sum_nltk'] = nltk_count(text, word_count=200)
                        end = time.time()
                        dur = round(end-start,3)

                        row['sum_nltk_words'] = word_count(row['sum_nltk'])
                        row['sum_nltk_runtime'] = dur
                        print('done (nltk)', dur, 'sec')
                    
                    # t5
                    if old == None or not 'sum_t5' in old:
                        start = time.time()
                        row['sum_t5'] = t5(text)
                        end = time.time()
                        dur = round(end-start,3)

                        row['sum_t5_words'] = word_count(row['sum_t5'])
                        row['sum_t5_runtime'] = dur
                        print('done (t5)', dur, 'sec')
                
                # detect language
                s = row['description'] if 'description' in row and row['description'] != '' else row['title']
                lang = lingo(s, simple=False)
                row['language_code'] = lang['code']
                row['language'] = lang['language']
                row['language_score'] = lang['probability']

                # equalizer
                if 'programming_language' in row and row['programming_language'] == 'Python notebook':
                    row['programming_language'] = 'Jupyter Notebook'
                    
                if 'license' in row:
                    if row['license'] == 'Apache 2.0':
                        row['license'] = 'Apache-2.0'
                    if row['license'] == 'Learn more about GitHub Sponsors':
                        row['license'] = None
                    if row['license'] == 'Unlicense':
                        row['license'] = None
                        
                row['tags'] = tag_equalizer(row['tags'])
                

                # convert datetime to string
                if 'date_project' in row:
                    row['date_project'] = str(row['date_project'])
                if 'date_scraped' in row:
                    row['date_scraped'] = str(row['date_scraped'])
                    
                # runtime
                item_end = time.time()
                item_dur = round(item_end-item_start, 3)
                row['runtime'] = item_dur

                df = df.append(row, ignore_index=True)

                # json encode
                #out.append(row)
                
                if overwrite == True and inplace==True:
                    row = {**old, **row}
                    drop = ['score']
                    for key in drop:
                        if key in row:
                            row.pop(key)
                    # restore category, subcategory and runtime
                    if row['category'] == '' and 'category' in old:
                        row['category'] = old['category']
                        row['category_score'] = old['category_score']
                    if row['subcategory'] == '' and 'subcategory' in old:
                        row['subcategory'] = old['subcategory']
                        row['subcategory_score'] = old['subcategory_score']
                    if row['runtime'] == '' and 'runtime' in old:
                        row['runtime'] = old['runtime']
                            
                #print(row)
                #sys.exit()
                
                store_data(row, json_fp, toJson=True)
                print(json_fp)
                j += 1

            #print(i, row['link'])
            i += 1

            # keep count of # rows processed
            if i % 100 == 0:
                print(i)

            if quit != 0 and i >= quit:
                break

        # store parsed csv
        #fp = csv_in.split('/')[-1]
        #df.to_csv(csv_out + fp, sep=';', index=False)
        #path = json_out + fp
        #path = path.replace('.csv', '.json')
        #store_data(out, path, toJson=True)
        
        print('DONE parsed', i, 'items')

In [44]:
# run the loop

#transform = ['ka_c', 'ka_cn', 'ka_d', 'ka_dn', 'ma', 'gh', 'tcp', 'bc']
transform = ['ka_c', 'ka_cn', 'ma', 'gh', 'tcp', 'bc']
#transform = ['ma']

datasets = {
    # kaggle competitions
    'ka_c': {
        'csv_in': '../data/database/kaggle_competitions_correlated_01.csv',
        'csv_format': 'kaggle_competition',
    },
    # kaggle competitions notebooks
    'ka_cn': {
        'csv_in': '../data/database/kaggle_competitions_01_original.csv',
        'csv_format': 'kaggle_notebook',
    },
    # kaggle datasets
    'ka_d': {
        'csv_in': '../data/database/kaggle_datasets_correlated_01.csv',
        'csv_format': 'kaggle_dataset',
    },
    # kaggle datasets notebooks
    'ka_dn': {
        'csv_in': '../data/database/kaggle_datasets_01_original.csv',
        'csv_format': 'kaggle_notebook',
    },
    # mlart
    'ma': {
        'csv_in': '../data/database/mlart_01_original.csv',
        'csv_format':'mlart',
    },
    # github
    'gh': {
        'csv_in': '../data/database/db_04_analyzed_v02.csv',
        'csv_format': 'github',
    },
    # thecleverprogrammer
    'tcp': {
        'csv_in': '../data/database/thecleverprogrammer_01_original.csv',
        'csv_format': 'tcp',
    },
    # blobcity
    'bc': {
        'csv_in': '../data/database/blobcity_02_analyzed.csv',
        'csv_format': 'github',
    },
}

    
for key in transform:
    print(key)
    item = datasets[key]
    transform_loop(item['csv_in'], item['csv_format'], key+'/', overwrite=True)

ka_c
0 https://www.kaggle.com/c/20-newsgroups-ciphertext-challenge
summarize
../data/database/json/ka_c/07554a25ba5010fc437c588c02637782.json
1 https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles
summarize
../data/database/json/ka_c/998dc92361462a193b1eee10472bc19e.json
2 https://www.kaggle.com/c/abstraction-and-reasoning-challenge
summarize
../data/database/json/ka_c/f1b8d78bbe5c3441abb14e95f265c987.json
3 https://www.kaggle.com/c/aerial-cactus-identification
summarize
../data/database/json/ka_c/10172da4e494109a2ed53419081a1096.json
4 https://www.kaggle.com/c/airbnb-recruiting-new-user-bookings
../data/database/json/ka_c/c4455229a9996e08978b53afd46e7fea.json
5 https://www.kaggle.com/c/airbus-ship-detection
summarize
../data/database/json/ka_c/a09ba09d429792c7deec1b662bb71827.json
6 https://www.kaggle.com/c/alaska2-image-steganalysis
summarize
../data/database/json/ka_c/647c8d2facd6bc052331da47849e6ed2.json
7 https://www.kaggle.com/c/allstate-claims-severity
../data/da

../data/database/json/ka_c/3d35b75c2ebd07971f12b4134167178d.json
66 https://www.kaggle.com/c/facebook-v-predicting-check-ins
summarize
../data/database/json/ka_c/39bcebdca682806d5d19fdbb82fa457d.json
67 https://www.kaggle.com/c/FacebookRecruiting
../data/database/json/ka_c/922cb868b7bce032f1b4bcb8b83cfb6c.json
68 https://www.kaggle.com/c/facial-keypoints-detection
../data/database/json/ka_c/3e8014290b0c66e49df19ae5006f90d2.json
69 https://www.kaggle.com/c/favorita-grocery-sales-forecasting
../data/database/json/ka_c/70326b2b01e875a3eedb646d694900f9.json
70 https://www.kaggle.com/c/flavours-of-physics-kernels-only
summarize
../data/database/json/ka_c/214cdb59fbf0df4fd87cdc36706a4fae.json
71 https://www.kaggle.com/c/flower-classification-with-tpus
summarize
../data/database/json/ka_c/7654365a4e65e4c949241d8d31e8c65f.json
72 https://www.kaggle.com/c/forest-cover-type-kernels-only
summarize
../data/database/json/ka_c/b1756523af13a69ea6cb499042391ee1.json
73 https://www.kaggle.com/c/forest-

../data/database/json/ka_c/f26016d0466158fed9010f0e16afddfb.json
131 https://www.kaggle.com/c/m5-forecasting-uncertainty
summarize
../data/database/json/ka_c/d1ad59124515d2f423209210f16c54af.json
132 https://www.kaggle.com/c/march-machine-learning-mania-2017
../data/database/json/ka_c/e103b32aaeea3888aa4a55459013d110.json
133 https://www.kaggle.com/c/melbourne-university-seizure-prediction
summarize
../data/database/json/ka_c/cfaa48b95bb56c33acb01eabe3639b58.json
134 https://www.kaggle.com/c/mens-machine-learning-competition-2018
summarize
../data/database/json/ka_c/9a6b17e044d8f03cc816cf365ee03cdf.json
135 https://www.kaggle.com/c/mens-machine-learning-competition-2019
summarize
../data/database/json/ka_c/00a474022f69b0bf64c8fa1136dc1ee0.json
136 https://www.kaggle.com/c/mercari-price-suggestion-challenge
summarize
../data/database/json/ka_c/d208af0c992558f906f911da1b2fd9e4.json
137 https://www.kaggle.com/c/mercedes-benz-greener-manufacturing
summarize
../data/database/json/ka_c/7d92d

../data/database/json/ka_c/3a08373a4445f764a226b5766bc8ac1f.json
196 https://www.kaggle.com/c/two-sigma-connect-rental-listing-inquiries
summarize
../data/database/json/ka_c/fd3376077e08958b41fa55c88056403d.json
197 https://www.kaggle.com/c/two-sigma-financial-modeling
summarize
../data/database/json/ka_c/b04f1133f07bff6fca4989a54be24a8c.json
198 https://www.kaggle.com/c/two-sigma-financial-news
summarize
../data/database/json/ka_c/44643fdaf17de0393848b49efb013145.json
199 https://www.kaggle.com/c/ultrasound-nerve-segmentation
../data/database/json/ka_c/b68bcd3ee8f86550db46cd818eda36c2.json
200
200 https://www.kaggle.com/c/understanding_cloud_organization
summarize
../data/database/json/ka_c/779fb2f20c9f3c122cecd42a5ec4ff52.json
201 https://www.kaggle.com/c/unimelb
summarize
../data/database/json/ka_c/5caf935c730bbb3d87a756098c1357e5.json
202 https://www.kaggle.com/c/vsb-power-line-fault-detection
../data/database/json/ka_c/f0f6a5def6dd3e2f9b15bd943d54ed29.json
203 https://www.kaggle.c

../data/database/json/ka_cn/d7639adda948c31705d90368d0b311b1.json
42 https://www.kaggle.com/abhinand05/in-depth-guide-to-convolutional-neural-networks
summarize
../data/database/json/ka_cn/e6f710de5baaebe3efad9bbe26fa51db.json
43 https://www.kaggle.com/aleksandradeis/arial-cactus-identification-with-pytorch-and-vgg16
../data/database/json/ka_cn/b81a1cd1c7860f8cd9d42bf141f23a53.json
44 https://www.kaggle.com/arjunrao2000/beginners-guide-efficientnet-with-keras
../data/database/json/ka_cn/57b445bf9895d6e4987ecb0bf786c008.json
45 https://www.kaggle.com/artgor/detecting-cactus-with-kekas
../data/database/json/ka_cn/9605df1578ddcdf94e5caff77831ad43.json
46 https://www.kaggle.com/ateplyuk/keras-starter-efficientnet
../data/database/json/ka_cn/a93e15858e7dc716c37eb84e6e7d633d.json
47 https://www.kaggle.com/ateplyuk/pytorch-efficientnet
../data/database/json/ka_cn/6167c484c9dc86fb267e15d984479dd9.json
48 https://www.kaggle.com/ateplyuk/starter-pytorch
../data/database/json/ka_cn/88b4af19ebc0af

../data/database/json/ka_cn/6b9f760d813aa27d95c72fc40936b75e.json
107 https://www.kaggle.com/chanhu/eye-efficientnet-pytorch-lb-0-777
../data/database/json/ka_cn/6fcddab254c61b8884b91861333669b7.json
108 https://www.kaggle.com/drhabib/starter-kernel-for-0-79
../data/database/json/ka_cn/53d9a08d4a4587102b825f3d6778b873.json
109 https://www.kaggle.com/hmendonca/efficientnetb4-fastai-blindness-detection
../data/database/json/ka_cn/8c1986736940cf384cab35bc672d6813.json
110 https://www.kaggle.com/mathormad/aptos-resnet50-baseline
../data/database/json/ka_cn/9ff56660d7830f1793ed4fb3618b4c77.json
111 https://www.kaggle.com/ratan123/aptos-2019-keras-baseline
../data/database/json/ka_cn/774668ee481a03a5245b883b644e4acb.json
112 https://www.kaggle.com/ratthachat/aptos-augmentation-visualize-diabetic-retinopathy
../data/database/json/ka_cn/e0035218894e5c2062d2f99c9de587cb.json
113 https://www.kaggle.com/ratthachat/aptos-eye-preprocessing-in-diabetic-retinopathy
../data/database/json/ka_cn/82d937e

../data/database/json/ka_cn/4d7d328163c6664f23c8ca48c70e55aa.json
169 https://www.kaggle.com/tunguz/adversarial-geotab
../data/database/json/ka_cn/d2f357a14eb95ce86a5e6ec9b8702a1f.json
170 https://www.kaggle.com/vikassingh1996/thoughtful-eda-feature-engineering-and-lightgbm
../data/database/json/ka_cn/2a7b4e9066c4e0a65a6bc2abc14c0a28.json
171 https://www.kaggle.com/biphili/we-live-in-era-of-sharing-economy
../data/database/json/ka_cn/a05e6eee73ab2a77568f0758765073bf.json
172 https://www.kaggle.com/biphili/why-car-when-you-can-bike
../data/database/json/ka_cn/7d01d9be6fd37fec17e6549500bb1706.json
173 https://www.kaggle.com/fatmakursun/bike-sharing-feature-engineering
../data/database/json/ka_cn/cf7cdaf83a746516bd9364e3c3869ee9.json
174 https://www.kaggle.com/fredkron/eda-ml-on-bike-sharing
../data/database/json/ka_cn/e16f7d6c31270ec951a58c5d9fdc881b.json
175 https://www.kaggle.com/hanifansari93/bike-sharing-demand-eda-modeling
../data/database/json/ka_cn/e17f4ec04b34678a7cc148e26702b229

231 https://www.kaggle.com/purplejester/pytorch-deep-time-series-classification
../data/database/json/ka_cn/3eb8f9bf6a10fb1cc566615c66bff1d1.json
232 https://www.kaggle.com/theoviel/deep-learning-starter
../data/database/json/ka_cn/2c6a6431d5bc9d547e4923d3b5ed7c1a.json
233 https://www.kaggle.com/trohwer64/simple-fourier-analysis
../data/database/json/ka_cn/bebe70b976d0195f80460f12ff0a863d.json
234 https://www.kaggle.com/whoiskk/validation-strategy-randomforest-0-71
../data/database/json/ka_cn/d28b0eaa13395a29231554bda8e7d85c.json
235 https://www.kaggle.com/ecobill/u-nets-with-keras
../data/database/json/ka_cn/9442adb201223eae813bf321d7bcb390.json
236 https://www.kaggle.com/gaborfodor/augmentation-methods
../data/database/json/ka_cn/dfdfc889f7d952b92572c86b5351d712.json
237 https://www.kaggle.com/kmader/vgg16-u-net-on-carvana
../data/database/json/ka_cn/81a992aef0fc98f840c98d3d3bd7e967.json
238 https://www.kaggle.com/stainsby/fast-tested-rle
../data/database/json/ka_cn/703ac6cd5abb7c0be

295 https://www.kaggle.com/karam123/cnn-and-lenet-on-cifar10
../data/database/json/ka_cn/f3485d7c35d859104c42905c3873a138.json
296 https://www.kaggle.com/kedarsai/cifar-10-88-accuracy-using-keras
summarize
../data/database/json/ka_cn/b6c4be52739b9c77f2da17074485cda4.json
297 https://www.kaggle.com/roblexnana/cifar10-with-cnn-for-beginer
../data/database/json/ka_cn/d6b01497c1b073b08977165d7e1d8e58.json
298 https://www.kaggle.com/vikasbhadoria/cifar10-high-accuracy-model-build-on-pytorch
../data/database/json/ka_cn/b6a4dd46a41bc469fd3e804380fe4851.json
299 https://www.kaggle.com/bsteenwi/cracking-the-code-difficulty-2
../data/database/json/ka_cn/69959b0d443ad9a6021839eaa41aff32.json
300
300 https://www.kaggle.com/group16/cracking-the-code-difficulty-3
../data/database/json/ka_cn/607d0a93eb1a24cf6411d9093599e0ef.json
301 https://www.kaggle.com/jshen97/a-glance-to-ciphertext-level-4
../data/database/json/ka_cn/5d508bf7aac48133e31d1d076b2a6165.json
302 https://www.kaggle.com/junkoda/finding

../data/database/json/ka_cn/109c2e90ef83549c75cd83c380feddc9.json
361 https://www.kaggle.com/corochann/covid-19-spread-situation-by-prefecture-in-japan
../data/database/json/ka_cn/45ccb3742326696b3d598790542c7271.json
362 https://www.kaggle.com/dferhadi/covid-19-predictions-growth-factor-and-calculus
../data/database/json/ka_cn/f520673f28ed3ba218f106ca75bf65d1.json
363 https://www.kaggle.com/janmejoy/covid19-time-series-analysis-plotly-visualization
summarize
../data/database/json/ka_cn/98359f9c7ac1f77a00e05a4e71df4e13.json
364 https://www.kaggle.com/jorijnsmit/mathematical-solution-to-sigmoid-parameters
../data/database/json/ka_cn/a309540f6338869be20d68cb5004e337.json
365 https://www.kaggle.com/mrmorj/covid-19-eda-xgboost
../data/database/json/ka_cn/c635ba5db3d44d8f8da15f137fbc1362.json
366 https://www.kaggle.com/nitishabharathi/the-story-of-covid-19-in-india-eda-and-prediction
../data/database/json/ka_cn/ba74bd842e58d31bba920a31605a6c5f.json
367 https://www.kaggle.com/aestheteaman01/

../data/database/json/ka_cn/bd0fbc71b4e178cc2c8a9fefec3c43b1.json
421 https://www.kaggle.com/jakubczakon/morphological-postprocessing-on-unet-lb-0-429
../data/database/json/ka_cn/4964d49c33bce0ead3e30240c435430f.json
422 https://www.kaggle.com/jerrythomas/exploratory-analysis
summarize
../data/database/json/ka_cn/3b6be01ced175e2dd9f2646e34693d94.json
423 https://www.kaggle.com/keegil/keras-u-net-starter-lb-0-277
../data/database/json/ka_cn/e1ae6c9f994333470fe8c96f8d59341b.json
424 https://www.kaggle.com/kmader/normalizing-brightfield-stained-and-fluorescence
../data/database/json/ka_cn/48e8e4558b802d6362b993bb3c17748c.json
425 https://www.kaggle.com/kmader/nuclei-overview-to-submission
../data/database/json/ka_cn/e3e66831c8a04f2566ce6bb9c3b5e14a.json
426 https://www.kaggle.com/mpware/ensembling-on-instance-segmentation-lb-0-419
../data/database/json/ka_cn/9ca67068292919f68f0c10edb2cede65.json
427 https://www.kaggle.com/mpware/stage1-eda-microscope-image-types-clustering
../data/databas

../data/database/json/ka_cn/79ca05d63ad6605d9ca3ded9aa39a58c.json
483 https://www.kaggle.com/pranaymns/datascience-london-sklearn-rfc-svm
../data/database/json/ka_cn/527d2e0148172d462a0954b85b55fca3.json
484 https://www.kaggle.com/sabahkarim/ds-tutorial-pca-gussian-mixture-grid-search
../data/database/json/ka_cn/4d78a7cda4f35f11d2dcf06f0a0070fe.json
485 https://www.kaggle.com/spanda2/data-science-london-scikit
../data/database/json/ka_cn/7337186552b898a4350091456169cd14.json
486 https://www.kaggle.com/brassmonkey381/a-quick-look-at-the-first-frame-of-each-video
../data/database/json/ka_cn/df8ef8d967eff0aacaa921fe8fb20e96.json
487 https://www.kaggle.com/gpreda/deepfake-starter-kit
../data/database/json/ka_cn/6108456224b8b1f81c18eb2a0e007939.json
488 https://www.kaggle.com/greatgamedota/xception-classifier-w-ffhq-training-lb-537
../data/database/json/ka_cn/6305a95e04790411f518d3a1a239036f.json
489 https://www.kaggle.com/hmendonca/proper-clustering-with-facenet-embeddings-eda
../data/data

../data/database/json/ka_cn/d399a657e096b32d132872ec23847bd9.json
546 https://www.kaggle.com/pintu161/transfer-learning-in-pytorch-using-resnet18
../data/database/json/ka_cn/a4ba2ba49649fb4306fd7f2b4eb024b1.json
547 https://www.kaggle.com/rblcoder/learning-cnn-in-tensorflow-coursera-course
../data/database/json/ka_cn/57e6575983d8c9ae95f3abab0a822fd9.json
548 https://www.kaggle.com/ruchibahl18/cats-vs-dogs-basic-cnn-tutorial
../data/database/json/ka_cn/36f626ef4a9e3766667fd3af24bfbc7a.json
549 https://www.kaggle.com/serkanpeldek/keras-cnn-transfer-learnings-on-cats-dogs-dataset
../data/database/json/ka_cn/6ce7e21a2235d9078cf9c8942e016d9c.json
550 https://www.kaggle.com/subhamoybhaduri/cnn-cat-and-dog-classification
../data/database/json/ka_cn/4995f823565e095690669ca986827659.json
551 https://www.kaggle.com/uysimty/keras-cnn-dog-or-cat-classification
../data/database/json/ka_cn/16e15a2fba1bb4dc2779a924e2ee4efc.json
552 https://www.kaggle.com/abhiksark/introduction-to-transfer-learning-ca

../data/database/json/ka_cn/cbd1085872d93e928c9c463374017864.json
609 https://www.kaggle.com/caesarlupum/ds4g-go-to-the-green-future
../data/database/json/ka_cn/0d4a56b1e019c5252ba92cfa1eba1b28.json
610 https://www.kaggle.com/caesarlupum/green-future-analysis-and-solution
../data/database/json/ka_cn/15a82942bf88c066d933e0b0d85f6b79.json
611 https://www.kaggle.com/caesarlupum/green-future-anomaly-analysis-time-series
../data/database/json/ka_cn/83db3177c9c04397a522f8ccfdc811f6.json
612 https://www.kaggle.com/chrisarderne/ds4g-an-analytical-approach-to-no2-emissions
summarize
../data/database/json/ka_cn/56f6c3a357ceda1bd7024d9ab05577a7.json
613 https://www.kaggle.com/deepakdeepu8978/methodology-for-average-historical-emissions
../data/database/json/ka_cn/955c9363503d13ec715227761a179542.json
614 https://www.kaggle.com/gpoulain/eda-ef-with-n2o-time-series-earth-engine
../data/database/json/ka_cn/11c01f883455d912046e2266cbe2a874.json
615 https://www.kaggle.com/katemelianova/ds4g-spatial-pa

../data/database/json/ka_cn/a8cd111bbda2077691c27f5251cbd05b.json
672 https://www.kaggle.com/adityaecdrid/quest-to-use-use-pytorch-xla
../data/database/json/ka_cn/16b98de7312f986557886453e06acb87.json
673 https://www.kaggle.com/cdeotte/cutmix-and-mixup-on-gpu-tpu
../data/database/json/ka_cn/ed1dac17a73eff8f6948c62b6c3d8ece.json
674 https://www.kaggle.com/cdeotte/rotation-augmentation-gpu-tpu-0-96
../data/database/json/ka_cn/369325a408a56e3aa38c63ca223224a8.json
675 https://www.kaggle.com/dhananjay3/fast-pytorch-xla-for-tpu-with-multiprocessing
../data/database/json/ka_cn/127550c5a7910d09583247904cc806e7.json
676 https://www.kaggle.com/dhananjay3/pytorch-xla-for-tpu-with-multiprocessing
../data/database/json/ka_cn/42a758cdac2a5eb7959bd20274b44670.json
677 https://www.kaggle.com/mgornergoogle/custom-training-loop-with-100-flowers-on-tpu
summarize
../data/database/json/ka_cn/833e6ab327186c742d4bd1f5a38fc4ca.json
678 https://www.kaggle.com/mmmarchetti/flowers-on-tpu-ii
../data/database/jso

../data/database/json/ka_cn/bafa470cdd7729ae1874734002058fb9.json
737 https://www.kaggle.com/erikbruin/google-analytics-eda-lightgbm-screenshots
summarize
../data/database/json/ka_cn/ee44bfa03c155aff6de345b33b28b62d.json
738 https://www.kaggle.com/fabiendaniel/lgbm-starter
../data/database/json/ka_cn/738983e985009145e64dbba5ad8fdaf3.json
739 https://www.kaggle.com/julian3833/2-quick-study-lgbm-xgb-and-catboost-lb-1-66
summarize
../data/database/json/ka_cn/a154ac1c8dcd66516443e3cf7bc0c30e.json
740 https://www.kaggle.com/kabure/exploring-the-consumer-patterns-ml-pipeline
../data/database/json/ka_cn/93ce895318ccdd4c94800d3fc576bcb5.json
741 https://www.kaggle.com/ogrellier/i-have-seen-the-future
../data/database/json/ka_cn/033157b8675e4b43a2ddbf1f6c578bb0.json
742 https://www.kaggle.com/ogrellier/teach-lightgbm-to-sum-predictions
../data/database/json/ka_cn/9622c5b5f61bbb95c5cc812a6a3f67bd.json
743 https://www.kaggle.com/ogrellier/using-classification-for-predictions
../data/database/json

../data/database/json/ka_cn/7225cd9b553083d44f87b67d7f9f7cbc.json
801 https://www.kaggle.com/shonenkov/oof-evaluation-mixup-efficientdet
../data/database/json/ka_cn/4dff772238d8599aac0b4cdad3d6aa2c.json
802 https://www.kaggle.com/shonenkov/training-efficientdet
../data/database/json/ka_cn/8f64cb77b3b690dbc166f7be2b8dffe7.json
803 https://www.kaggle.com/shonenkov/wbf-approach-for-ensemble
../data/database/json/ka_cn/9ac3abf5769b82d6054eece339f1e877.json
804 https://www.kaggle.com/shonenkov/wbf-over-tta-single-model-efficientdet
../data/database/json/ka_cn/ff10938428d4bb15c7cced5f38d533af.json
805 https://www.kaggle.com/tanulsingh077/end-to-end-object-detection-with-transformers-detr
../data/database/json/ka_cn/1e1ebfd8496ba72ee159f629e193edcb.json
806 https://www.kaggle.com/ufownl/global-wheat-detection-pseudo-labaling
../data/database/json/ka_cn/27c64905d679682780359db3f595cafe.json
807 https://www.kaggle.com/aldrin644/analysis-between-new-and-old-open-image-dataset
../data/database/js

../data/database/json/ka_cn/ebf9ea384788fa1a44e0342715b78e5c.json
867 https://www.kaggle.com/huikang/hc-2019q-eda-and-baseline-soln
../data/database/json/ka_cn/488d59e0274dd5c307692e7b9e29c34f.json
868 https://www.kaggle.com/drobchak1988/herbarium-2020-fgvc7-create-tfrecords-tensorflow
../data/database/json/ka_cn/55a6a2e4536dc801ea0fc7bdaea64006.json
869 https://www.kaggle.com/jagannathrk/herbarium-2020
../data/database/json/ka_cn/9b2f0ff09326843d4265b71711e5a15f.json
870 https://www.kaggle.com/jullang/herbarium-via-resnet50-and-3-step-classification
../data/database/json/ka_cn/bc6cb82b7543df6a07e7cc5c092736e2.json
871 https://www.kaggle.com/rsingh99/getting-started-with-herbarium-2020
../data/database/json/ka_cn/3bbf312b123884d121f5a69c8ce85e74.json
872 https://www.kaggle.com/seraphwedd18/herbarium-consolidating-the-details
../data/database/json/ka_cn/42c8c33e11648b247ec4b86a87362fa1.json
873 https://www.kaggle.com/sergey55/herbarium-2020-notebook
../data/database/json/ka_cn/97f52d3d4

../data/database/json/ka_cn/70d0f29ad11a75e992564a9cefd176ea.json
927 https://www.kaggle.com/rejpalcz/cnn-128x128x4-keras-from-scratch-lb-0-328
../data/database/json/ka_cn/c89060a3c97cf276d692ba6c49be50b0.json
928 https://www.kaggle.com/rejpalcz/gapnet-pl-lb-0-385
../data/database/json/ka_cn/4a93b2fd2c22f0e597a758635dfb682a.json
929 https://www.kaggle.com/therealpythonman/get-350k-additional-hpa-images
../data/database/json/ka_cn/1361eef2ba5ae2f54e17e47d5cc5e145.json
930 https://www.kaggle.com/zhugds/resnet34-with-rgby-fast-ai-fork
summarize
../data/database/json/ka_cn/ec9436860975f7933d802c90f426052a.json
931 https://www.kaggle.com/artgor/pytorch-whale-identifier
../data/database/json/ka_cn/e8074bf1b8bfdb7510caeb28371efe77.json
932 https://www.kaggle.com/ashishpatel26/comprehensive-guide-of-object-detection-algorithms
../data/database/json/ka_cn/5321f2ae506028ca13ddb4dfac3664a3.json
933 https://www.kaggle.com/ashishpatel26/triplet-loss-network-for-humpback-whale-prediction
../data/dat

987 https://www.kaggle.com/yanastamenova/imaterialist-segmentation-task
../data/database/json/ka_cn/303a3af278f4f0dd3b5d0d3fa41b6541.json
988 https://www.kaggle.com/ateplyuk/efficientnet-keras-s75-b200-e20
../data/database/json/ka_cn/a5ca15dafa0c3bfb96539f0ab9c38acb.json
989 https://www.kaggle.com/ateplyuk/keras-starter
../data/database/json/ka_cn/06f829a36664e2f22424c0f8e4ebcc17.json
990 https://www.kaggle.com/backaggle/imet-fastai-starter-focal-and-fbeta-loss
../data/database/json/ka_cn/4d99adf7d463f7cd717bd8e4c66a5b0b.json
991 https://www.kaggle.com/chewzy/eda-weird-images-with-new-updates
../data/database/json/ka_cn/4af000148d620e0888add54d02868f89.json
992 https://www.kaggle.com/dimitreoliveira/imet-collection-2019-eda-keras
../data/database/json/ka_cn/796684ce3b6fbe4172a4c19a6f5d3f74.json
993 https://www.kaggle.com/h4211819/leaderboard-analysis
../data/database/json/ka_cn/2a980c1ce00fc2627dffb6313d8854c3.json
994 https://www.kaggle.com/hidehisaarai1213/imet-pytorch-starter
../dat

../data/database/json/ka_cn/80deeae6ed35aa2bc9cdb982f9f9bcdb.json
1051 https://www.kaggle.com/kylehounslow/a-method-for-finding-leaked-images-in-test-set
../data/database/json/ka_cn/e3b9bc28e02454f9d62e2acbd7136b77.json
1052 https://www.kaggle.com/miguelpm/r-mxnet-simple-tutorial
summarize
../data/database/json/ka_cn/267781a7f177dd6ae4d62f652c07e213.json
1053 https://www.kaggle.com/philschmidt/cervix-eda-model-selection
../data/database/json/ka_cn/c02ac71259c94b703e14c5869e3c68cf.json
1054 https://www.kaggle.com/poonaml/intel-cervical-cancer-eda
../data/database/json/ka_cn/18ba934ac7446763b9c10bf8c432d76b.json
1055 https://www.kaggle.com/scottykwok/making-sense-out-of-some-difficult-samples
../data/database/json/ka_cn/999d2b69da264d7ef913798ab53e6822.json
1056 https://www.kaggle.com/vfdev5/data-exploration-1
../data/database/json/ka_cn/93eda97b0e735e0fbea3fc02b1006a28.json
1057 https://www.kaggle.com/vfdev5/type-1-clustering
../data/database/json/ka_cn/7ca257db62ab256de2d5938e6dab494a.

../data/database/json/ka_cn/eb73b57320e6b6400a8a07660d2ed74c.json
1111 https://www.kaggle.com/jhoward/nb-svm-strong-linear-baseline
../data/database/json/ka_cn/5edb2e7bdb36bfbe0d55f9695e2a158b.json
1112 https://www.kaggle.com/rhodiumbeng/classifying-multi-label-comments-0-9741-lb
../data/database/json/ka_cn/13fba9d97b5071cd65458c5de5101d58.json
1113 https://www.kaggle.com/sbongo/do-pretrained-embeddings-give-you-the-extra-edge
summarize
../data/database/json/ka_cn/0a59d2464fa64d291de75c12975bd525.json
1114 https://www.kaggle.com/sbongo/for-beginners-tackling-toxic-using-keras
../data/database/json/ka_cn/013e5446370ff8f918ff72fafdf8985c.json
1115 https://www.kaggle.com/abhishek/pytorch-bert-inference
../data/database/json/ka_cn/175814c535141f0a92fff37d692d7e8b.json
1116 https://www.kaggle.com/adityaecdrid/public-version-text-cleaning-vocab-65
../data/database/json/ka_cn/6537fcf56b1c92045cc8140ad38385fc.json
1117 https://www.kaggle.com/artgor/cnn-in-keras-on-folds
../data/database/json/k

../data/database/json/ka_cn/d82d1e4c92f20ab5b98d6b6231424c93.json
1173 https://www.kaggle.com/arjoonn/preliminary-exploration
../data/database/json/ka_cn/2b5cf05a2f84eaa48e158a162a0760ec.json
1174 https://www.kaggle.com/dixhom/data-analysis-for-beginners
../data/database/json/ka_cn/db021b56f62eadf50c20a995305048d9.json
1175 https://www.kaggle.com/kevins/kobe-shots-show-me-your-best-model
../data/database/json/ka_cn/bd2cae2c69a796739891c9e18029a1fe.json
1176 https://www.kaggle.com/khozzy/kobe-shots-show-me-your-best-model
../data/database/json/ka_cn/a4190de78c2c7e7eba268372fdc35a93.json
1177 https://www.kaggle.com/mpwolke/kobe-bryant-dear-basketball
../data/database/json/ka_cn/b843d31440bc8b8e2013dfd2cc2f523e.json
1178 https://www.kaggle.com/selfishgene/psychology-of-a-professional-athlete
../data/database/json/ka_cn/ceec8559d207676f8b65314525e72b58.json
1179 https://www.kaggle.com/basu369victor/kuzushiji-recognition-just-like-digit-recognition
../data/database/json/ka_cn/6291ea2f9d8df5

../data/database/json/ka_cn/2d471c51fb57e75e89f8b7680a2d1103.json
1231 https://www.kaggle.com/allunia/shaking-earth
../data/database/json/ka_cn/a3232accc1bb977dfa6793c0f744a5c7.json
1232 https://www.kaggle.com/artgor/earthquakes-fe-more-features-and-samples
../data/database/json/ka_cn/3db66828403c8df89a2fcb8163463e03.json
1233 https://www.kaggle.com/artgor/even-more-features
../data/database/json/ka_cn/920c79e7c3a4092dc75b9c6d1ac92364.json
1234 https://www.kaggle.com/artgor/feature-selection-model-interpretation-and-more
../data/database/json/ka_cn/600cb1c2fef2b2fbc268420772e84aaf.json
1235 https://www.kaggle.com/artgor/seismic-data-eda-and-baseline
../data/database/json/ka_cn/6cb3314b02442caffe0223b5a94f5d42.json
1236 https://www.kaggle.com/avloss/audio-analysis-with-animation
../data/database/json/ka_cn/8c25c701e06d9aeee29e167556adf594.json
1237 https://www.kaggle.com/bigironsphere/parameter-tuning-in-one-function-with-hyperopt
summarize
../data/database/json/ka_cn/f56b02f9be542dee3f

../data/database/json/ka_cn/3c279990d5167bf0df409681045fda02.json
1295 https://www.kaggle.com/tuckerarrants/lyft-ensembling-raster-sizes
summarize
../data/database/json/ka_cn/0c8b50e48042bd858c73d68e5a4ef8c5.json
1296 https://www.kaggle.com/anshuls235/time-series-forecasting-eda-fe-modelling
../data/database/json/ka_cn/8dc56815ff5e684aa171f3f150cc6601.json
1297 https://www.kaggle.com/girmdshinsei/for-japanese-beginner-with-wrmsse-in-lgbm
../data/database/json/ka_cn/e5e83f6eabfac829d1ef5dbef18fa67c.json
1298 https://www.kaggle.com/harupy/m5-baseline
../data/database/json/ka_cn/111a7bd9dfc1aa8829e3b143a0f52fa8.json
1299 https://www.kaggle.com/kneroma/m5-first-public-notebook-under-0-50
../data/database/json/ka_cn/4bf1542e5d81d7af8b8ead571de7aba9.json
1300
1300 https://www.kaggle.com/kneroma/m5-forecast-v2-python
../data/database/json/ka_cn/41dd1ab94cec26da3a160d2db6b8eecc.json
1301 https://www.kaggle.com/kyakovlev/m5-custom-features
summarize
../data/database/json/ka_cn/f2fb7ff5b607ea448

../data/database/json/ka_cn/94cfc57ce7f182ba6936e7f9bb6c9cd9.json
1354 https://www.kaggle.com/knowledgegrappler/a-simple-nn-solution-with-keras-0-48611-pl
../data/database/json/ka_cn/5a2ffff0335555faa5f5eead7ab1c3d6.json
1355 https://www.kaggle.com/lopuhin/eli5-for-mercari
../data/database/json/ka_cn/97e1618a94695906741f096a42da5578.json
1356 https://www.kaggle.com/maheshdadhich/i-will-sell-everything-for-free-0-55
summarize
../data/database/json/ka_cn/c4beb2f9a26e68abcd744d8adda0a4ff.json
1357 https://www.kaggle.com/thykhuely/mercari-interactive-eda-topic-modelling
../data/database/json/ka_cn/ff886a11908d34c9896faded0ff870a3.json
1358 https://www.kaggle.com/tilii7/cross-validation-weighted-linear-blending-errors
summarize
../data/database/json/ka_cn/3f13607fd47befebc955205161294e22.json
1359 https://www.kaggle.com/valkling/mercari-rnn-2ridge-models-with-notes-0-42755
../data/database/json/ka_cn/ad094e0fb21933cf95c62807706a792a.json
1360 https://www.kaggle.com/anokas/mercedes-eda-xgboo

../data/database/json/ka_cn/977cc04619105aaaf0bd3cb61b8e0749.json
1416 https://www.kaggle.com/mrkmakr/neural-network-with-mae-objective-0-01381
../data/database/json/ka_cn/502de5021b0c325b8121ee3332e5bb2d.json
1417 https://www.kaggle.com/robikscube/big-data-bowl-comprehensive-eda-with-pandas
../data/database/json/ka_cn/bbdf9a3afea06f7e144cc8669d5a6a38.json
1418 https://www.kaggle.com/robikscube/nfl-big-data-bowl-plotting-player-position
../data/database/json/ka_cn/84bdf4a979843491ec29e4439e60ec61.json
1419 https://www.kaggle.com/statsbymichaellopez/nfl-tracking-wrangling-voronoi-and-sonars
summarize
../data/database/json/ka_cn/839a66eb3b90b76f7038122723359bd2.json
1420 https://www.kaggle.com/sudalairajkumar/simple-exploration-notebook-nfl
../data/database/json/ka_cn/fbe00db1b7f7de8655ed5c8d9c1b7a4a.json
1421 https://www.kaggle.com/benjenkins96/nfl-1st-and-future-analysis
summarize
../data/database/json/ka_cn/46c8e607372767b48933cbb39fc39eef.json
1422 https://www.kaggle.com/david289/nfl

../data/database/json/ka_cn/81cc02bc856769af98013b185da37e52.json
1478 https://www.kaggle.com/poonaml/last-cab-to-new-york-animated-heatmap-trips-folium
../data/database/json/ka_cn/f0f667973de5be70950adb25ff584968.json
1479 https://www.kaggle.com/selfishgene/yellow-cabs-tell-the-story-of-new-york-city
../data/database/json/ka_cn/b0988694d25c30b2eee93a9cafcf249f.json
1480 https://www.kaggle.com/sheriytm/brewed-tpot-for-nyc-with-love-lb0-37
../data/database/json/ka_cn/74bb315aad2c6446d6f36f4871ddfe79.json
1481 https://www.kaggle.com/wti200/exploratory-analysis-nyc-taxi-trip
../data/database/json/ka_cn/1d5fc34e82acf48a01ceb7ea4351822b.json
1482 https://www.kaggle.com/anshuljdhingra/instance-segmentation
../data/database/json/ka_cn/88d5f1d4e7ef1ee120ab4b0f3c606a08.json
1483 https://www.kaggle.com/gowrishankarin/basic-eda-with-plotly-and-images
../data/database/json/ka_cn/5eac56d71479659d5eac94df8ca3c3e3.json
1484 https://www.kaggle.com/iiyamaiiyama/how-to-submit-prediction
../data/database

1541 https://www.kaggle.com/artgor/exploration-of-data-step-by-step
../data/database/json/ka_cn/abc2302ef8f44666f65494c4c0e536ed.json
1542 https://www.kaggle.com/bminixhofer/5th-place-solution-code
summarize
../data/database/json/ka_cn/fa928dd373e28c581fc795968df590b2.json
1543 https://www.kaggle.com/carlolepelaars/eda-and-ensembling
../data/database/json/ka_cn/32399cd379147551c208b1b22f78f954.json
1544 https://www.kaggle.com/christofhenkel/extract-image-features-from-pretrained-nn
../data/database/json/ka_cn/3f9717344502f98d3f758cafdbc8dc72.json
1545 https://www.kaggle.com/erikbruin/petfinder-my-detailed-eda-and-xgboost-baseline
../data/database/json/ka_cn/f30543a8e161cb3780a1dab1f4b8f319.json
1546 https://www.kaggle.com/myltykritik/simple-lgbm-image-features
../data/database/json/ka_cn/f650592c3d00236cce4adf0fc8c2926d.json
1547 https://www.kaggle.com/naveenasaithambi/optimizedrounder-improved
../data/database/json/ka_cn/99f3768430ba8bd72b349379cd5aee08.json
1548 https://www.kaggle.co

../data/database/json/ka_cn/2639a7210f271bd3c4f9058dc2226a0e.json
1600
1600 https://www.kaggle.com/jimpsull/collaboratingwithkagglecommunity-1-037-lb
../data/database/json/ka_cn/45f08d8970ee09e125e776aab9ba63b8.json
1601 https://www.kaggle.com/johnfarrell/plasticc-2018-metadata-simple-eda
../data/database/json/ka_cn/dc1165f983c4c5b48306be661e597914.json
1602 https://www.kaggle.com/kyleboone/naive-benchmark-galactic-vs-extragalactic
../data/database/json/ka_cn/23f71d63731afb5d86972cc9fa73389b.json
1603 https://www.kaggle.com/meaninglesslives/lgb-parameter-tuning
../data/database/json/ka_cn/5288a1c126ef1e61b5bfc8c7e0e92aa8.json
1604 https://www.kaggle.com/michaelapers/the-plasticc-astronomy-classification-demo
../data/database/json/ka_cn/12fdc988901fff95b6162d84fa0b71a9.json
1605 https://www.kaggle.com/michaelapers/the-plasticc-astronomy-starter-kit
../data/database/json/ka_cn/832fb1b9bb500c20f99b8bb845dc5066.json
1606 https://www.kaggle.com/mithrillion/know-your-objective
../data/databa

../data/database/json/ka_cn/512e6087f2eedcf12b8b7baac102758c.json
1664 https://www.kaggle.com/bminixhofer/deterministic-neural-networks-using-pytorch
../data/database/json/ka_cn/08d6b86e19456fc470d2900f3b288441.json
1665 https://www.kaggle.com/christofhenkel/how-to-preprocessing-when-using-embeddings
../data/database/json/ka_cn/8e5dd00a2af354d4fecd6fb2b9085279.json
1666 https://www.kaggle.com/gmhost/gru-capsule
../data/database/json/ka_cn/52c6700d93e872e31b3d7aae01e536a3.json
1667 https://www.kaggle.com/jannen/reaching-0-7-fork-from-bilstm-attention-kfold
../data/database/json/ka_cn/871f1613800f18ba4ba4820b01062efc.json
1668 https://www.kaggle.com/rajmehra03/a-detailed-explanation-of-keras-embedding-layer
../data/database/json/ka_cn/c5886da1e3a6d195a4045693aed91d93.json
1669 https://www.kaggle.com/shujian/different-embeddings-with-attention-fork-fork
../data/database/json/ka_cn/cc0016d7fbaec7e6b56c421b953d27b4.json
1670 https://www.kaggle.com/shujian/mix-of-nn-models-based-on-meta-embe

../data/database/json/ka_cn/0e2c772c85045e95f21922da6893637d.json
1723 https://www.kaggle.com/jesucristo/quick-visualization-eda
../data/database/json/ka_cn/a2e8078fba0aef2974505c20aa3ddd34.json
1724 https://www.kaggle.com/pheaboo/a-journey-through-the-experiment-design
../data/database/json/ka_cn/ab0ee0cf449be1986355b5261a0ab827.json
1725 https://www.kaggle.com/roydatascience/cellular-stacking-1-5
../data/database/json/ka_cn/45ffd1e10b6cde91259b2cee5e1519bd.json
1726 https://www.kaggle.com/tanlikesmath/rcic-fastai-starter
../data/database/json/ka_cn/fb0cc5c7be2a5dd17dea7a979bb35d0b.json
1727 https://www.kaggle.com/xhlulu/recursion-2-headed-efficientnet-2-stage-training
summarize
../data/database/json/ka_cn/1d71402540385389f789609aafd33503.json
1728 https://www.kaggle.com/zaharch/keras-model-boosted-with-plates-leak
../data/database/json/ka_cn/052a6cbd9f99b66e22c150a0b48be7dc.json
1729 https://www.kaggle.com/adamdc/classifiers-competition-for-unbalanced-data
../data/database/json/ka_cn

../data/database/json/ka_cn/75ccd743b2efc586ad4fd26077a9a051.json
1784 https://www.kaggle.com/osciiart/baseline-with-no-image
../data/database/json/ka_cn/0d9334a3556891bda4ba2eb06854fefd.json
1785 https://www.kaggle.com/redwankarimsony/rsna-str-3d-stacking-3d-plot-segmentation
../data/database/json/ka_cn/ad5a6118b5edbae5bafcdc26c1597801.json
1786 https://www.kaggle.com/redwankarimsony/rsna-str-pe-gradient-sigmoid-windowing
../data/database/json/ka_cn/ef1894bf23a20bf54ac2e27beaae3d0c.json
1787 https://www.kaggle.com/rythian47/vision-transformer-goodbye-cnn-training
../data/database/json/ka_cn/dfe4066d5f425d3e410c8a2d977e44b0.json
1788 https://www.kaggle.com/seraphwedd18/pe-detection-with-keras-model-creation
../data/database/json/ka_cn/993792cc2ebb55a928beee023187ae46.json
1789 https://www.kaggle.com/wrrosa/advanced-dicom-ct-3d-visualizations-with-vtk
../data/database/json/ka_cn/b8ae36ba59f82fd80d821a63339c931d.json
1790 https://www.kaggle.com/aceconhielo/data-analysis-and-patterns-reco

1848 https://www.kaggle.com/khahuras/0-53x-clustering-using-hough-features-basic
../data/database/json/ka_cn/8888131dc24a24d0c50a86884514100d.json
1849 https://www.kaggle.com/mikhailhushchyn/dbscan-benchmark
../data/database/json/ka_cn/8436c859de739cc91e22f5075b5fef1d.json
1850 https://www.kaggle.com/mikhailhushchyn/hough-transform
../data/database/json/ka_cn/48130d9f8d1cf582e4e301fb03323bfb.json
1851 https://www.kaggle.com/mindcool/hdbscan-clustering-ii
../data/database/json/ka_cn/6079ba9d25d4989774040ac91b10a773.json
1852 https://www.kaggle.com/mindcool/unrolling-of-helices-outliers-removal
../data/database/json/ka_cn/1ba97773918a8f959c8cc3b6e1fb6ecb.json
1853 https://www.kaggle.com/outrunner/trackml-2-solution-example
../data/database/json/ka_cn/969ee7528f1d0f52bc7e080747da20ca.json
1854 https://www.kaggle.com/shivamb/trajectory-animation-eda
summarize
../data/database/json/ka_cn/0c6b9ed4b63c319e83896c47e4688fba.json
1855 https://www.kaggle.com/yuval6967/7th-place-clustering-extendi

../data/database/json/ka_cn/4b781345efac2feaecf0bb25686b2663.json
1912 https://www.kaggle.com/den3b81/improve-perfomances-using-manager-features
../data/database/json/ka_cn/e416fdf7101bfc2cff6b69153e336a11.json
1913 https://www.kaggle.com/kashnitsky/topic-6-feature-engineering-and-feature-selection
../data/database/json/ka_cn/bda9cafb459d09bdeeef219164fc6f94.json
1914 https://www.kaggle.com/poonaml/two-sigma-renthop-eda
../data/database/json/ka_cn/123e18ab146a7787d43ebffe4726c085.json
1915 https://www.kaggle.com/somnisight/microsoft-lightgbm-starter
../data/database/json/ka_cn/a0dbc64e0dc8107d8660c1d500515ddd.json
1916 https://www.kaggle.com/sudalairajkumar/xgb-starter-in-python
../data/database/json/ka_cn/799941621df20bfb383f6028845c7640.json
1917 https://www.kaggle.com/allunia/feature-dynamics-looking-at-id-groups
../data/database/json/ka_cn/090cf78502a7fcf22d66245c0d43a759.json
1918 https://www.kaggle.com/anokas/two-sigma-time-travel-eda
../data/database/json/ka_cn/4c06121ec821b2e44

1974 https://www.kaggle.com/roydatascience/eda-iso-pca-lle-stratified-lstm-attention
summarize
../data/database/json/ka_cn/9a5af88b358d877c99942a7f4f067604.json
1975 https://www.kaggle.com/suicaokhoailang/5-fold-lstm-with-threshold-tuning-0-618-lb
../data/database/json/ka_cn/b82c859e67a4f9c1a7fbc8148f15165b.json
1976 https://www.kaggle.com/suicaokhoailang/transformer-baseline-0-672-lb
../data/database/json/ka_cn/c4251449acb0513a7ab673ec79853aa4.json
1977 https://www.kaggle.com/tarunpaparaju/vsb-competition-attention-bilstm-with-features
../data/database/json/ka_cn/3efd430c9e8b459b854ba117b5823880.json
1978 https://www.kaggle.com/theoviel/fast-fourier-transform-denoising
../data/database/json/ka_cn/e1444fce743c825d65c7caf621d72a70.json
1979 https://www.kaggle.com/xhlulu/exploring-signal-processing-with-scipy
../data/database/json/ka_cn/db32582d873f165ce6c8ef0b7a5d3265.json
1980 https://www.kaggle.com/deeptiagl/sarimax-vs-prophet-vs-random-forest
../data/database/json/ka_cn/46490b36c1a52

../data/database/json/ka_cn/122a53b69bdce5a8aa716df7577f9f69.json
2034 https://www.kaggle.com/hamidhaghshenas/adaboostclassifier
../data/database/json/ka_cn/7f83fc222c6c166605d17037698eb39b.json
2035 https://www.kaggle.com/hsinwenchang/lgbm-parameter-tuning
../data/database/json/ka_cn/41e5d6cc378869adddcf102a96228fa4.json
2036 https://www.kaggle.com/mattjburrill/home-court-advantage-weighted-4-factors
../data/database/json/ka_cn/4d1321f2d19150a28e6a0648f0b12b1c.json
2037 https://www.kaggle.com/mshaked/women-ballers
../data/database/json/ka_cn/309a69851e0631998148077ae97ef8d9.json
2038 https://www.kaggle.com/peacefultransfer/simulate-the-tournament-based-on-your-predictions
../data/database/json/ka_cn/79e86a3057d46ad4b9479e17adf5d887.json
2039 https://www.kaggle.com/takaishikawa/no-ml-modeling
../data/database/json/ka_cn/1000c0cedfb390d0d1fe8dc23c221b3c.json
2040 https://www.kaggle.com/harshitmakkar/nlp-word2vec
../data/database/json/ka_cn/51b86560215d24b242bee6818d2829da.json
2041 http

16 https://mlart.co/item/using-face-recognition-and-a-robot-arm-to-find-grains-of-sand-that-looks-like-faces
../data/database/json/ma/dd7b4a571301b37196f324372df0732e.json
17 https://mlart.co/item/a-robot-that-uses-a-camera-and-translates-the-image-feature-to-rnn-features_-and-writes-poems-on-the-beach
../data/database/json/ma/c13a383b6152239dc4004d7ace3e494a.json
18 https://mlart.co/item/automatically-animate-digital-humans-or-your-digital-double-with-just-text-or-speech
../data/database/json/ma/7a1ecf60b2076eb4aca7260697624bda.json
19 https://mlart.co/item/a-gpt-model-trained-on-_30k-font-bios-to-generate-descriptions-of-speculative-fonts
../data/database/json/ma/fa465b4098dca8e5975fa4c431ec3dc8.json
20 https://mlart.co/item/0_1-second-samples-of-birds-tweeting-and-mapped-according-to-a-variable-audio-features-configuration-which-continuously-transforms-the-point-cloud
../data/database/json/ma/0acbb3d5ce577b17b6450dd7b7382c1e.json
21 https://mlart.co/item/a-collection-of-_ambigrammat

../data/database/json/ma/2324c3d7084c8f37b1405b04109c0556.json
64 https://mlart.co/item/video-transferred-into-next-frame-predictions-via-a-conditional-gan
../data/database/json/ma/61bde1d93c687e949b555bbbf9946bb0.json
65 https://mlart.co/item/train-a-stylegan-on-nature-images-and-interpolate-the-model-inside-of-a-vr-cinema
../data/database/json/ma/43b67ea802ee35be5f51d033d5f9d513.json
66 https://mlart.co/item/a-stylegan-trained-on-images-from-mars-mro-and-interpolated-frame-by-frame-to-create-a-3d-effect
../data/database/json/ma/0bb903f69bff9830061465c9e2532e98.json
67 https://mlart.co/item/display-installation-with-gan-interpolations-with-photos-of-eyes-with-makeup
../data/database/json/ma/c9d677c5ab713049ab664e18c25825b2.json
68 https://mlart.co/item/apply-slow-motion-and-fluid-footage-and-optical-flow-based-style-transfer-on-paradis_s-recto-verso
../data/database/json/ma/ad5fe65a4d845c0e632ae4703008e394.json
69 https://mlart.co/item/pass-the-pixels-of-a-frame-through-a-quantum-comp

../data/database/json/ma/4c8ab1b1ded849d4d44ac17c031e6d4f.json
117 https://mlart.co/item/apply-gaugan-on-charlie-chaplin-dressed-as-a-tree
../data/database/json/ma/568ba49be12e330cf5c870443d583c95.json
118 https://mlart.co/item/an-overpainted-gan-interpolation-of-a-smoking-monkey
../data/database/json/ma/02ff2c009340f92af7f4e67774b803d3.json
119 https://mlart.co/item/apply-patches-from-artworks-to-a-painting-and-guide-it-with-a-gan-discriminator
../data/database/json/ma/086c6dd67301762e7d79ca6306d4446c.json
120 https://mlart.co/item/upscaling-old-video-games-with-gaugan
../data/database/json/ma/82472baaa0041b6c4948df16011730c0.json
121 https://mlart.co/item/stylegan-trained-on-floor-plans-and-stacked-on-top-of-each-other
../data/database/json/ma/fbb6e2f05ce8a9ea9c2cf1c4453472a3.json
122 https://mlart.co/item/style-transfer-applied-to-a-collage-of-eyes
../data/database/json/ma/753b826b4e91954e377b46b93b163d9a.json
123 https://mlart.co/item/cnn-generated-blend-of-corals
../data/database/

../data/database/json/ma/215fcf6f740b4013a8e54e54f0bc438b.json
171 https://mlart.co/item/use-a-word-vector-to-generate-an-image_-and-a-discriminator-to-detect-the-word-vector-from-the-generated-image_-made-by-cncing-unfired-clay
../data/database/json/ma/89a89b02771f2ff6a5f0133f86357bc9.json
172 https://mlart.co/item/a-stylegan-trained-on-2-million-architecture-photos-and-interpolated
../data/database/json/ma/2ce4dd35bd715a8d0634350995722c5e.json
173 https://mlart.co/item/a-music-generator-trained-on-hundreds-of-thousands-of-midi-files-and-trained-with-gpt-2
../data/database/json/ma/94e42ba1df88afabca90f17ddf69c75d.json
174 https://mlart.co/item/a-3d-gan-trained-on-1000-sculptures-inspired-by-the-greek-god-dionysus
../data/database/json/ma/14c1b0e104c2875580a294015a10cf66.json
175 https://mlart.co/item/recurrent-application-of-style-tranfer_-then-processed-by-progan
../data/database/json/ma/f29619337f60c68d4ae1347b35512274.json
176 https://mlart.co/item/a-conditional-gan-trained-on-a-da

../data/database/json/ma/a7919874e3e2c5cbff1a522b91d43b26.json
224 https://mlart.co/item/visualizing-face-recognition-data
../data/database/json/ma/e16bfb7f1a0ec7b90387042abf719b93.json
225 https://mlart.co/item/use-cyclegan-to-turn-mars_-surface-into-earth-like
../data/database/json/ma/3c29dbc31143aed8ec7724540a2ae0a0.json
226 https://mlart.co/item/train-a-model-to-map-a-gesture-to-a-sound
../data/database/json/ma/f877a6c1f62259e648cd25fb02482545.json
227 https://mlart.co/item/gan-generative-images-based-on-dali
../data/database/json/ma/05131c3cd8a28dd4902028f6bce2132e.json
228 https://mlart.co/item/use-a-turing-test-chatbot-to-guide-an-improvisational-theatre-act
../data/database/json/ma/2c4e5e18420ea7f7b09aaa45845eadf0.json
229 https://mlart.co/item/generate-pictures-of-flowers-with-a-gan
../data/database/json/ma/58f53a880de6535f6f06fca8da6f07a0.json
230 https://mlart.co/item/use-gan-paintings-and-object-detection-to-remove-objects-from-a-video
../data/database/json/ma/4c43b98125ff4

../data/database/json/ma/6256a507ba6eeb1e819440ec6228d1d6.json
281 https://mlart.co/item/generate-a-logo-from-a-drawing-using-pix2pix-hd
../data/database/json/ma/b69bc7f6505a61620d4beeb38257b421.json
282 https://mlart.co/item/colorize-modern-bandw-photographs-with-a-u-net
../data/database/json/ma/b17f7c72a608308aeed7b5fc79274f38.json
283 https://mlart.co/item/use-an-image-captioning-model_-pix2code_-to-translate-design-mockups-into-a-website
../data/database/json/ma/d09b5e63af7ec5eeb111bbeb8b848207.json
284 https://mlart.co/item/a-computer-looks-at-itself-with-a-camera-and-translates-image-features-into-text-with-an-rnn
../data/database/json/ma/7c7e9cbfe72151ba794c06cf30e3ed6d.json
285 https://mlart.co/item/gan-generated-video-of-scanned-plants-inspired-by-stan-brakhage
../data/database/json/ma/2c2c6b797371c3eeca2bc9f5598c7f65.json
286 https://mlart.co/item/generate-gan-images-and-weave-the-generated-images
../data/database/json/ma/a5c4ca9253a8d12b2cdc47ce05479def.json
287 https://mlar

../data/database/json/ma/d87d2b9196fc0fbe8ade924a065208cf.json
334 https://mlart.co/item/gan-generated-images-of-architecture-plans
../data/database/json/ma/491f2c1489ef2c556f5693e6fd24070e.json
335 https://mlart.co/item/reconstruct-images-by-introploating-a-collage-of-gans-and-guiding-it-with-a-gan-discriminator
../data/database/json/ma/3a490a09faa9abf5f18a66fd8fdf4fc8.json
336 https://mlart.co/item/pix2pix-repeatedly-takes-the-generated-output-as-the-new-input-to-create-a-one-hour-long-video-of-train-windows
../data/database/json/ma/beb76abea305f5b02ea72d2f4f67b03d.json
337 https://mlart.co/item/organise-the-features-of-an-image-with-t-sne_-and-visualize-the-similarity-between-artworks
../data/database/json/ma/ceabe44fea98b111e33c78eee1c6d9f4.json
338 https://mlart.co/item/translate-two-bit-doodles-into-paintings-with-semantic-style-transfer
../data/database/json/ma/46c5abb372b75641e53c788a0d8b6bee.json
339 https://mlart.co/item/translate-image-features-from-a-graphical-user-interfac

382 https://mlart.co/item/corpus-based-visual-synthesis-recreating-the-simpsons-intro-with-the-intro-from-the-family-guy
../data/database/json/ma/bcf90255760da20d2ad436cdda13ce2d.json
383 https://mlart.co/item/create-a-text-model_-potentially-using-markov-chains_-based-on-the-debate-between-michel-foucault-and-noam-chomsky-and-performed-as-a-theatre-piece
../data/database/json/ma/fac28946aae90f2624677b35fa6ebbd9.json
384 https://mlart.co/item/a-blog-post-made-with-markov-chains-starting-with-religion-and-ending-up-discussing-adobe-after-effects
../data/database/json/ma/f2d882ee85b8fca68bd820d0109a0468.json
385 https://mlart.co/item/blues-improvisation-with-an-lstm
../data/database/json/ma/ba80d233a8aad8728474b3d4c9249399.json
386 https://mlart.co/item/algorithmically-generate-a-face-that-requires-the-least-amount-of-features-to-encode_-inspired-by-leonardo-da-vinci-and-albrecht-duerer
../data/database/json/ma/3391a9260aefd29d3ceb986dc4797d30.json
387 https://mlart.co/item/create-an-art

../data/database/json/gh/fa680195598485e3e6a3d63df1e76565.json
65 https://github.com/budach/pysster
../data/database/json/gh/be96775d35cb036b6224589c0b79ee3e.json
66 https://github.com/buddyd16/Structural-Engineering
../data/database/json/gh/af40cc942a0b8a2da0fe9f297246f6b7.json
67 https://github.com/bukosabino/financial-forecasting-challenge-gresearch
../data/database/json/gh/cb39d7765719a61cd267ea9256400762.json
68 https://github.com/burkesquires/python_biologist
../data/database/json/gh/fecae62eb4ca618a468ed8008f61a3b5.json
69 https://github.com/buzz11/productionFailures
../data/database/json/gh/a1cbe6bf4fe4083b2c5ff152b665532e.json
70 https://github.com/byukan/Marketing-Data-Science
../data/database/json/gh/255813e45371254b24f9bd2dbfaa6f6f.json
71 https://github.com/bzjin/menus
../data/database/json/gh/d555c964d10b526eac5d6461d78c83c4.json
72 https://github.com/CAChemE/learn
../data/database/json/gh/8af80a4bce4411bb717d36e603d91bc8.json
73 https://github.com/cadrev/lstm-flood-predi

../data/database/json/gh/c1c526f24e0bc0bf6bed4b538170222a.json
135 https://github.com/duncangh/FSA
../data/database/json/gh/b3bc52ed015a0c23f140fec13ca32343.json
136 https://github.com/ebrahimraeyat/Civil
../data/database/json/gh/aad9a49dea5c67f099c22af5f03f7631.json
137 https://github.com/edmundooo/more-money-more-problems
../data/database/json/gh/08075c83c6f5b85b8085a65257c2e013.json
138 https://github.com/ehsanasgari/Deep-Proteomics
../data/database/json/gh/d4fa78b1df6657e1d07fe270b58bdd64.json
139 https://github.com/EliadProject/Hotels-Data-Science
../data/database/json/gh/4ec562244057fe30b6a70aada420339f.json
140 https://github.com/eloyekunle/student_intervention/
../data/database/json/gh/1651ac8981aa2e0396ce788b107c8620.json
141 https://github.com/EricHe98/Financial-Statements-Text-Analysis
../data/database/json/gh/37cffc08833216ee05839df74e88b2b9.json
142 https://github.com/erickjtorres/AI_LegalDoc_Hackathon
../data/database/json/gh/a6bb6366d8097f0a9d27c61bede1876f.json
143 http

../data/database/json/gh/4dd632ff1bc02331767c148f0da4e70e.json
209 https://github.com/jerryxyx/EquineTrading
../data/database/json/gh/7cd6f8b15c93296c59a7bb5ecbb71d9e.json
210 https://github.com/jfzhang95/LSTM-water-table-depth-prediction
../data/database/json/gh/36315cfec1c33f287f3c0952bb9fbda1.json
211 https://github.com/jhconning/Dev-II/
../data/database/json/gh/db9ac297ee3294a58459a44621dd631f.json
212 https://github.com/jinsonfernandez/NLP_School-Budget-Project
../data/database/json/gh/8bb991cb2987a67b08cfbbd085a36244.json
213 https://github.com/jjakimoto/finance_ml
../data/database/json/gh/63896d7666222294ec4887fbc179700c.json
214 https://github.com/jlperla/ECON407_2018
../data/database/json/gh/ec1a034572c0bc154f7c481de2aa1e57.json
215 https://github.com/joelowj/Machine-Learning-and-Reinforcement-Learning-in-Finance
../data/database/json/gh/9fda97f7ffa074f7184baef9f001f8c4.json
216 https://github.com/johnfwhitesell/CensusPull/
../data/database/json/gh/b7fcc1b226fe495e4d2037ee7ec4

../data/database/json/gh/a84001fcda38ae5b098491f49a860ca0.json
286 https://github.com/mratsim/McKinsey-SmartCities-Traffic-Prediction
../data/database/json/gh/38db7d9dd73bfb2870656399fb6d1e05.json
287 https://github.com/mroberge/hydrofunctions
../data/database/json/gh/afdfaa45546703f75514f7c81d59d360.json
288 https://github.com/mschermann/forensic_accounting
../data/database/json/gh/a4fdcb5a1a3280737b3b59af395cc82a.json
289 https://github.com/muntisa/Deep-Politics
../data/database/json/gh/697b58353260e7e160d2b7f02024df03.json
290 https://github.com/Murgio/Food-Recipe-CNN
../data/database/json/gh/1da3e14643ee421ba18d0e2feaa31c2b.json
291 https://github.com/Myau5x/anti-recommender
../data/database/json/gh/8a2631aa8674da956b68fc95e7d371bf.json
292 https://github.com/nd1/DC_RestaurantViolationForecasting
../data/database/json/gh/10b0aa7731a0f4d48f50ef8adb8dae59.json
293 https://github.com/nealmcb/pr_voting_methods
../data/database/json/gh/d027da1afc4abd22593ccb3066876a66.json
294 https://g

../data/database/json/gh/2ae41828b90a71e3f473c6fc6351a7b2.json
358 https://github.com/SarahMestiri/online-retail-case
../data/database/json/gh/bf599f80a2a60ffbdd5c9d732eb67483.json
359 https://github.com/Sardhendu/PropertyClassification
../data/database/json/gh/6556585d8dacfc096f81a01549b39d00.json
360 https://github.com/scngo/SD-ambulance-allocation
../data/database/json/gh/1593c009ff4c4f2be935241ab1029643.json
361 https://github.com/sdasadia/Oil-Natural-Gas-Price-Prediction
../data/database/json/gh/5e8b09254c35a7c9356663fafc532ec8.json
362 https://github.com/SeanMcOwen/FinanceAndPython.com-BasicFinance
../data/database/json/gh/90eed86a383d801c685a6912682fff57.json
363 https://github.com/SeanMcOwen/FinanceAndPython.com-ClusteringIndustries
../data/database/json/gh/46949ccc199647c7dd77184574daf7ee.json
364 https://github.com/SeanMcOwen/FinanceAndPython.com-Derivatives
../data/database/json/gh/e2b96c1898877c55f24a9987f7c520b7.json
365 https://github.com/SeanMcOwen/FinanceAndPython.com-I

../data/database/json/gh/300666edf74636e9b8854cb927a704c7.json
431 https://github.com/zhentaoshi/econ5170
../data/database/json/gh/a83d09131428cefb5aedcc674b6f35ae.json
432 https://github.com/zischwartz/workerfatalities
../data/database/json/gh/1951ebbca315e85545d0e4e80b2d85bd.json
DONE parsed 433 items
tcp
0 https://thecleverprogrammer.com/2020/05/09/data-science-project-on-handwritten-digits/
../data/database/json/tcp/920773d09bd3b701ba7f7e2ffa3c67f6.json
1 https://thecleverprogrammer.com/2020/05/11/data-science-project-stock-price-prediction-with-machine-learning/
summarize
../data/database/json/tcp/5e4b1de7cdccebe0630f71a488c0b37d.json
2 https://thecleverprogrammer.com/2020/05/11/stock-price-prediction-with-machine-learning/
summarize
../data/database/json/tcp/79387ed366b12a64657bc2a68e32c4c4.json
3 https://thecleverprogrammer.com/2020/05/14/data-science-project-on-classification-of-text/
summarize
../data/database/json/tcp/f20a86f7ec77c06af2ce8089d8893689.json
4 https://thecleverp

../data/database/json/tcp/e6debacd3ec9322d32e91b881beb47e0.json
50 https://thecleverprogrammer.com/2020/07/19/image-classification-with-ann/
../data/database/json/tcp/42b06e332231e9ae83ba20ca44974b64.json
51 https://thecleverprogrammer.com/2020/07/20/binary-classification-model/
summarize
../data/database/json/tcp/f3673de7e14a0c2c87993bc6c473e467.json
52 https://thecleverprogrammer.com/2020/07/20/data-augmentation-in-deep-learning/
../data/database/json/tcp/b61502889e759c5c359b3fcd29c4ac89.json
53 https://thecleverprogrammer.com/2020/07/20/next-word-prediction-model/
../data/database/json/tcp/684789b21cf59ee0008285310c8c0039.json
54 https://thecleverprogrammer.com/2020/07/21/multiclass-classification/
../data/database/json/tcp/d7ff559f1b39d3441e14602ec5ce7a44.json
55 https://thecleverprogrammer.com/2020/07/21/pipelines-in-machine-learning/
summarize
../data/database/json/tcp/d7a2c854ad4c86a4774d8467a877a687.json
56 https://thecleverprogrammer.com/2020/07/21/tensorboard-for-visualizatio

../data/database/json/tcp/d19c8162d4a8667b278e4e79e85707ec.json
112 https://thecleverprogrammer.com/2020/09/12/sarima-in-machine-learning/
../data/database/json/tcp/a233e646a0c18c61d976a56e544fbb84.json
113 https://thecleverprogrammer.com/2020/09/15/python-automl-libraries/
../data/database/json/tcp/e541c206f8c2df94f2dde86bd65add9c.json
114 https://thecleverprogrammer.com/2020/09/19/machine-learning-for-healthcare/
../data/database/json/tcp/683210c1a22ba33dc70097c697e6f008.json
115 https://thecleverprogrammer.com/2020/09/20/maths-for-machine-learning/
../data/database/json/tcp/bad85aa404b7c49130a1a81322a04026.json
116 https://thecleverprogrammer.com/2020/09/21/when-do-we-need-machine-learning/
../data/database/json/tcp/aa7a8326fe8c2e3010707c3a3d7f456a.json
117 https://thecleverprogrammer.com/2020/09/22/standardscaler-in-machine-learning/
../data/database/json/tcp/4b8a96cc06d6678396473e8a7bbcf444.json
118 https://thecleverprogrammer.com/2020/09/24/instagram-filters-with-python/
../data/

168 https://thecleverprogrammer.com/2020/12/04/online-shopping-intention-analysis-with-python/
../data/database/json/tcp/9105255886079dfabc165f6a75ed323d.json
169 https://thecleverprogrammer.com/2020/12/05/sign-language-classification-with-python/
../data/database/json/tcp/1745c3bdacce69a4a0ebd4751d5e1cd6.json
170 https://thecleverprogrammer.com/2020/12/06/resume-screening-with-python/
../data/database/json/tcp/68b7ea82153b8bc0d4f2658f351debef.json
171 https://thecleverprogrammer.com/2020/12/07/sentiment-analysis-with-python/
../data/database/json/tcp/0627f2d53d1e051e8c7dddb6dce507fd.json
DONE parsed 172 items
bc
0 https://github.com/583/machine_learning_notebook
../data/database/json/bc/09a3e87997f2c6435e86e052e4a63a30.json
1 https://github.com/aaren/notedown
../data/database/json/bc/179d21367b7f97216bbadec3ec79f5e2.json
2 https://github.com/abhinavsagar/kaggle-notebooks
../data/database/json/bc/5c4f717208aea9e2370039194d72beb7.json
3 https://github.com/abulbasar/machine-learning
../d

../data/database/json/bc/087f9794d794dda99b3371bfde5355fb.json
68 https://github.com/ceos-seo/data_cube_notebooks
../data/database/json/bc/f55b4753f73d0bf5afab049e4952b89f.json
69 https://github.com/cerlymarco/MEDIUM_NoteBook
../data/database/json/bc/f7995c6ce5a98552983dd13a29ef553c.json
70 https://github.com/cgoliver/Notebooks
../data/database/json/bc/b0baf4e347c6c1b9e9a62561abe541b0.json
71 https://github.com/ChadFulton/tsa-notebooks
../data/database/json/bc/ec61791228f0115d95e33508e76c5859.json
72 https://github.com/chainer-community/chainer-colab-notebook
../data/database/json/bc/a72b62e933ee12d8d8369caea653b724.json
73 https://github.com/chambliss/Notebooks
../data/database/json/bc/27b61732d471aad105457a9b85c03d3b.json
74 https://github.com/chhayac/Machine-Learning-Notebooks
../data/database/json/bc/c27c7ad745d157383b655a0936d2063a.json
75 https://github.com/chmp/ipytest
../data/database/json/bc/9c85aefb2a557f02f80a3c1f92a4e984.json
76 https://github.com/chris1610/pbpython
../data

144 https://github.com/enakai00/jupyter_tfbook
../data/database/json/bc/adff23612e7fb12abb018357044a0158.json
145 https://github.com/equinor/segyio-notebooks
../data/database/json/bc/7e4a0720a0471cb71265a6503ce25fa5.json
146 https://github.com/erhwenkuo/deep-learning-with-keras-notebooks
../data/database/json/bc/2aec579d9a6f6412703649587a858533.json
147 https://github.com/ericmjl/bayesian-analysis-recipes
../data/database/json/bc/4bca04ba93f76bf32efc664fa4ed436d.json
148 https://github.com/erykml/medium_articles
../data/database/json/bc/d6b8d72a895ba91ee90ce42daa5ea1f3.json
149 https://github.com/espnet/notebook
../data/database/json/bc/3d751a178fc4071c2df46a276a3c5104.json
150 https://github.com/executablebooks/jupyter-book
../data/database/json/bc/876138dd553e1989f024a8c84485f014.json
151 https://github.com/explosion/spacy-notebooks
../data/database/json/bc/360eb23af8ed3e91c0e57ad6831cf615.json
152 https://github.com/falloutdurham/beginners-pytorch-deep-learning
../data/database/json

../data/database/json/bc/039bb5d15dc236acc7d45c0bf6c01534.json
219 https://github.com/jakevdp/WhirlwindTourOfPython
../data/database/json/bc/a15e6dca320257b0db83dba410428b7e.json
220 https://github.com/jalammar/simpleTensorFlowClassificationExample
../data/database/json/bc/567d58d8f83ba5214b0269aaa768fd03.json
221 https://github.com/jamesdbrock/learn-you-a-haskell-notebook
../data/database/json/bc/0065cbc76fd0832c7b3fc8ef5baed767.json
222 https://github.com/JasonKessler/Scattertext-PyData
../data/database/json/bc/29a531ea15869cecde2f5a4e932df7ea.json
223 https://github.com/jasonstrimpel/PyData-Meetup
../data/database/json/bc/12947e3098485b490699f320e05951b3.json
224 https://github.com/jayboxyz/deeplearning_cv_notes
../data/database/json/bc/97b772f42b5066712d05016301206228.json
225 https://github.com/jbagnato/machine-learning
../data/database/json/bc/7bec72139cb3fb7a90ab77f92522597d.json
226 https://github.com/jbwhit/berkeley-jupyter-notebook
../data/database/json/bc/afabc8f37d6afe61aba

298 https://github.com/LSSTC-DSFP/LSSTC-DSFP-Sessions
../data/database/json/bc/fe73085f0e581d921e2b3bcb8a193f54.json
299 https://github.com/lyhue1991/spark_tutorial
../data/database/json/bc/4412cc00e2b554fe804d64e425a50ca4.json
300
300 https://github.com/m2dsupsdlclass/lectures-labs
../data/database/json/bc/e7ca9db7ed7a36759b6047e4e1457c33.json
301 https://github.com/maartenbreddels/ipyvolume
../data/database/json/bc/4ae1638ee756ef2275079ff813152d2c.json
302 https://github.com/maartenbreddels/ipywebrtc
../data/database/json/bc/3a8bc385b9132f76f0aab02ff5d7206e.json
303 https://github.com/MaayanLab/clustergrammer-widget
../data/database/json/bc/fb8acf22f1cd7c051d94005dd8690fe7.json
304 https://github.com/MaayanLab/Zika-RNAseq-Pipeline
../data/database/json/bc/06ecd7b70dbfa9da4194c82be7d7666e.json
305 https://github.com/man-group/notebooker
../data/database/json/bc/7dbbc53cf13b70e9722a59d35ed6d1a8.json
306 https://github.com/mandli/numerical-methods-pdes
../data/database/json/bc/bb43ebe25

375 https://github.com/patrickvonplaten/notebooks
../data/database/json/bc/1c06291bd93e7e3a1eb27f510ed4dddf.json
376 https://github.com/pbugnion/gmaps
../data/database/json/bc/57cc30fecb91a120d6ee1f5970f90d92.json
377 https://github.com/PegasusWang/notebooks
../data/database/json/bc/1e1a978c17ff630b9b01eec4dd1b4e92.json
378 https://github.com/peterroelants/notebooks
../data/database/json/bc/f056283a4fc257e4e8bdf0aadaa06cb7.json
379 https://github.com/pgmpy/pgmpy_notebook
../data/database/json/bc/5a3138acb4c40efe1a714b08bfd80ef1.json
380 https://github.com/piegu/fastai-projects
../data/database/json/bc/ce86aa06bdc1a96127a973d25194e7ea.json
381 https://github.com/pierpaolo28/Data-Visualization
../data/database/json/bc/e729ad383dc332c9ec75c39d6b88becd.json
382 https://github.com/pierrelux/notebooks
../data/database/json/bc/2bd2b9246e8afc17df686280401bde4f.json
383 https://github.com/PipelineAI/katacoda-notebooks
../data/database/json/bc/497c08b6f0299b7573a0b91ff9e4051f.json
384 https://gi

../data/database/json/bc/7800e74d57c2cae54667d48a36c8081e.json
448 https://github.com/sanzgiri/deeplearning.ai
../data/database/json/bc/92ce589d0fb1f150bceeccacb56e4c7e.json
449 https://github.com/sat28/githubcommit
../data/database/json/bc/d5c9356efba552e4b4b600d6d0a4ecff.json
450 https://github.com/scipy/scipy-cookbook
../data/database/json/bc/7da9ee048dd63701b4fe23d276f9e86e.json
451 https://github.com/scoutbee/pytorch-nlp-notebooks
../data/database/json/bc/cacc7ab360f20627d3048c0919d88158.json
452 https://github.com/sean-parent/notebook
../data/database/json/bc/dc5b8d3d5341738d1835a11f697f85dd.json
453 https://github.com/seranus/faceswap-notebooks
../data/database/json/bc/f3485f6c0924c8d88a75b251b0d2e2da.json
454 https://github.com/sergiogama/notebook
../data/database/json/bc/cae0db40e47ad3123b410df6881288a2.json
455 https://github.com/sgugger/Deep-Learning
../data/database/json/bc/5249d5514c30f56e43b3c30fcb37f642.json
456 https://github.com/shakedzy/notebooks
../data/database/json

../data/database/json/bc/2be5402f61f59d52920d60124bc27e2b.json
520 https://github.com/watson-developer-cloud/assistant-improve-recommendations-notebook
../data/database/json/bc/47e4628d41a3f2f35e8d2519d23441f8.json
521 https://github.com/wenmin-wu/jupyter-tabnine
../data/database/json/bc/6b60cb2122cc05a3eeb0d19e7a20490a.json
522 https://github.com/wesm/pydata-book
../data/database/json/bc/bb488a09b19025fafa0ccbd4ecec28e5.json
523 https://github.com/wilfredinni/python-cheatsheet
../data/database/json/bc/563d91784cb62dcafc7c6cbbffacff63.json
524 https://github.com/willb/fraud-notebooks
../data/database/json/bc/9013a17f0df52aa58452a981891e540b.json
525 https://github.com/WittyOrator/JupyterNotebook
../data/database/json/bc/3b280b9286f653faa7fdd5abaaa328b6.json
526 https://github.com/wy-ei/notebook
../data/database/json/bc/1771c9f25304aad3ee86cb0ee15b204b.json
527 https://github.com/yandexdataschool/mlhep2019
../data/database/json/bc/01bfab1498f513ecc56415c3a296a182.json
528 https://github

LangDetectException: No features in text.

In [32]:
# zero shot categorization is computational intense
# so let's keep it out from the loop and process it seperatly

In [33]:
print(cat)
print(subcat)

['Accommodation & Food', 'Accounting', 'Agriculture', 'Banking & Insurance', 'Biotechnological & Life Sciences', 'Construction & Engineering', 'Economics', 'Education & Research', 'Emergency & Relief', 'Finance', 'Government and Public Works', 'Healthcare', 'Justice, Law and Regulations', 'Manufacturing', 'Media & Publishing', 'Miscellaneous', 'Physics', 'Real Estate, Rental & Leasing', 'Utilities', 'Wholesale & Retail']
['Failure', 'Food', 'Fraud', 'General', 'Genomics', 'Insurance and Risk', 'Judicial Applied', 'Life-sciences', 'Machine Learning', 'Maintenance', 'Management and Operations', 'Marketing', 'Material Science', 'Physical', 'Policy and Regulatory', 'Politics', 'Preventative and Reactive', 'Quality', 'Real Estate', 'Rental & Leasing', 'Restaurant', 'Retail', 'School', 'Sequencing', 'Social Policies', 'Student', 'Textual Analysis', 'Tools', 'Tourism', 'Trading & Investment', 'Transportation', 'Valuation', 'Water & Pollution', 'Wholesale']


In [46]:
# classification

folder = '../data/database/json/'
subfolder = os.listdir(folder)
#print(subfolder)

#transform = ['ka_c', 'ka_cn', 'ka_d', 'ka_dn', 'ma', 'gh', 'tcp', 'bc']
transform = ['ka_c', 'ka_cn', 'ma', 'gh', 'tcp', 'bc']
#transform = ['ma']

recreate_category = False
save = True
categorzie_t5 = False
categorize_nltk = True
categorize_fallback = True

quit = 0
i = j = 0
for item in subfolder:
    print('folder', item)
    fp = os.path.join(folder, item)
    if os.path.isdir(fp) and item in transform:
        print('###')
        print(item)
        files = os.listdir(fp)
        print('files in folder:', len(files))
        for file in files:
            row = load_data(os.path.join(folder, item, file), fromJson=True)
            #print(row)
            
            print('row:', i, 'item:', j, 'link:', row['link'], 'file:', file)
            
            # zero shot categorization
            if not 'category' in row or row.get('category') == '' or recreate_category == True:
                print('categorize')
                start = time.time()
                j += 1

                # create category and subcategory from t5
                if 'sum_t5' in row and row['sum_t5'] != '' and categorzie_t5 == True:
                    s = row['sum_t5']
                    res = categorize(s, cat)
                    #row['t5_category_raw'] = res
                    c = row['t5_category'] = res['category']
                    c_score = row['t5_category_score'] = res['score']
                    row['t5_category_runtime'] = res['runtime']
                    print('t5 category', res['runtime'], 'sec')

                    res = categorize(s, subcat)
                    #row['t5_subcategory_raw'] = res
                    sc = row['t5_subcategory'] = res['category']
                    sc_score = row['t5_subcategory_score'] = res['score']
                    row['t5_subcategory_runtime'] = res['runtime']
                    print('t5 subcategory', res['runtime'], 'sec')
                else:
                    print('t5 skipped')

                # create category and subcategory from nltk
                if 'sum_nltk' in row and row['sum_nltk'] != '' and categorize_nltk == True:
                    s = row['sum_nltk']
                    res = categorize(s, cat)
                    #print(res)
                    #row['nltk_category_raw'] = res
                    c = row['nltk_category'] = res['category']
                    c_score = row['nltk_category_score'] = res['score']
                    row['nltk_category_runtime'] = res['runtime']
                    print('nltk category', res['runtime'], 'sec')

                    res = categorize(s, subcat)
                    #print(res)
                    #row['nltk_subcategory_raw'] = res
                    sc = row['nltk_subcategory'] = res['category']
                    sc_score = row['nltk_subcategory_score'] = res['score']
                    row['nltk_subcategory_runtime'] = res['runtime']
                    print('nltk subcategory', res['runtime'], 'sec')
                else:
                    print('nltk skipped')

                # create category and subcategory from title or description if not already done
                if categorize_fallback == True and not 't5_category' in row and not 'nltk_category' in row:
                    if len(row['description']) > 0:
                        s = row['description']
                        res = categorize(s, cat)
                        #row['description_category_raw'] = res
                        c = row['description_category'] = res['category']
                        c_score = row['description_category_score'] = res['score']
                        row['description_category_runtime'] = res['runtime']
                        print('description category', res['runtime'], 'sec')

                        res = categorize(s, subcat)
                        #row['description_subcategory_raw'] = res
                        sc = row['description_subcategory'] = res['category']
                        sc_score = row['description_subcategory_score'] = res['score']
                        row['description_subcategory_runtime'] = res['runtime']
                        print('description subcategory', res['runtime'], 'sec')
                    else:
                        s = row['title']
                        if s != '':
                            res = categorize(s, cat)
                            #row['title_category_raw'] = res
                            c = row['title_category'] = res['category']
                            c_score = row['title_category_score'] = res['score']
                            row['title_category_runtime'] = res['runtime']
                            print('title category', res['runtime'], 'sec')

                            res = categorize(s, subcat)
                            #row['title_subcategory_raw'] = res
                            sc = row['title_subcategory'] = res['category']
                            sc_score = row['title_subcategory_score'] = res['score']
                            row['title_subcategory_runtime'] = res['runtime']
                            print('title subcategory', res['runtime'], 'sec')
                        else:
                            print('nothing found to categorize')
                            c = sc = ''
                            c_score = sc_score = 0
                            j -= 1

                row['category'] = c
                row['category_score'] = c_score
                row['subcategory'] = sc
                row['subcategory_score'] = sc_score

                end = time.time()
                dur = round(end-start, 3)
                row['runtime_cat'] = dur
                
                fp = os.path.join(folder, item, file)
                if save == True:
                    store_data(row, fp, toJson=True)
                else:
                    print('NOT SAVED')
                    print(row)
            
            i += 1
            
            if i%100 == 0:
                print(i)
            
            if quit!= 0 and i >= quit:
                break
    if quit!= 0 and i >= quit:
                break
            
print('DONE parsed', i, 'items')

folder bc
###
bc
files in folder: 541
row: 0 item: 0 link: https://github.com/HazyResearch/cs145-notebooks-2016 file: 003eb1b987f35ed6d88b1c7930e6057f.json
row: 1 item: 0 link: https://github.com/marsggbo/deeplearning.ai_JupyterNotebooks file: 004a0f102a03862a74da782e6a796f80.json
categorize
t5 skipped
nltk skipped
title category 6.505 sec
title subcategory 4.751 sec
row: 2 item: 1 link: https://github.com/davidbp/lxmls-notebooks file: 004a915abfded4001d49385a09f77f19.json
categorize
t5 skipped
nltk skipped
title category 2.722 sec
title subcategory 3.852 sec
row: 3 item: 2 link: https://github.com/jamesdbrock/learn-you-a-haskell-notebook file: 0065cbc76fd0832c7b3fc8ef5baed767.json
row: 4 item: 2 link: https://github.com/ageron/tf2_course file: 00708ba1c3a32ce45749ebcb633e8c5a.json
row: 5 item: 2 link: https://github.com/InsightDataLabs/ipython-notebooks file: 00b2db9f2524a95318f16eed6e605dd0.json
categorize
t5 skipped
nltk skipped
title category 2.846 sec
title subcategory 3.819 sec
r

title category 2.375 sec
title subcategory 3.534 sec
row: 74 item: 11 link: https://github.com/d2l-ai/d2l-book file: 22c958fbc28addccdf31a339fb688df7.json
row: 75 item: 11 link: https://github.com/ysyisyourbrother/SYSU_Notebook file: 239a0f9778f3b4c3d603a509fdd5312f.json
row: 76 item: 11 link: https://github.com/Alireza-Akhavan/rnn-notebooks file: 23d36f2684274e246b3377e85acbad53.json
row: 77 item: 11 link: https://github.com/rtidatascience/connected-nx-tutorial file: 2479d3699677cfbd582c51fc81d35bdc.json
row: 78 item: 11 link: https://github.com/thibo73800/tensorflow2.0-examples file: 253fed001f9b271e993a34a58f5c9abb.json
row: 79 item: 11 link: https://github.com/falloutdurham/beginners-pytorch-deep-learning file: 255767b67037b0153ba095c81a05a89c.json
row: 80 item: 11 link: https://github.com/dformoso/sklearn-classification file: 25fc184a3f7b7c0755cb9aa959326929.json
row: 81 item: 11 link: https://github.com/qutip/qutip-notebooks file: 278ec77fa645c2f1710cf32ff8af6a08.json
row: 82 ite

title category 4.537 sec
title subcategory 6.847 sec
row: 151 item: 15 link: https://github.com/OTRF/notebooks-forge file: 4dfea1bae4708d4317f7e570aa7322c0.json
row: 152 item: 15 link: https://github.com/uwdata/visualization-curriculum file: 4e4a4fa94d836e0555b197ae6a5743da.json
row: 153 item: 15 link: https://github.com/IBM/nodejs-in-notebooks file: 4e686c249c8efbd71dbfdb18386e4338.json
row: 154 item: 15 link: https://github.com/velocyto-team/velocyto-notebooks file: 502b31c0cd0f584eda981914ef8360c7.json
categorize
t5 skipped
nltk skipped
title category 4.447 sec
title subcategory 4.61 sec
row: 155 item: 16 link: https://github.com/kaleko/CourseraML file: 5067d72f4fca5b73c8c5a5c3f1edee8e.json
row: 156 item: 16 link: https://github.com/timsainb/python_spectrograms_and_inversion file: 506ef9f98987c54b0d315037a5a988ea.json
row: 157 item: 16 link: https://github.com/GeostatisticsLessons/GeostatisticsLessonsNotebooks file: 518989b27d43d3ae7b293493115c2982.json
row: 158 item: 16 link: https

title category 2.743 sec
title subcategory 3.831 sec
row: 216 item: 24 link: https://github.com/gaoxuesong/KerasNotebook file: 729d0290ce7b53c367b979bd5a8b76fe.json
row: 217 item: 24 link: https://github.com/azer/notebook file: 72a8cfb46d095370b5bd86fc3eb7408a.json
row: 218 item: 24 link: https://github.com/ledmaster/notebooks_tutoriais file: 733b401a95f63056d62b248b2a49fc8d.json
row: 219 item: 24 link: https://github.com/biocore/American-Gut file: 733d3661df3fbe2f3da6919dcb4c5981.json
row: 220 item: 24 link: https://github.com/planet-os/notebooks file: 733e84a3ffd9a565930cac5d5ce482e5.json
row: 221 item: 24 link: https://github.com/CODAIT/covid-notebooks file: 73cca01b71841d8b9dd3d502aed6688f.json
row: 222 item: 24 link: https://github.com/neuwangmeng/Keras_Jupyter_Notebooks file: 73f3edadceb2beb5c8b64324d23589ae.json
categorize
t5 skipped
nltk skipped
title category 3.425 sec
title subcategory 4.617 sec
row: 223 item: 25 link: https://github.com/d2l-ai/1day-notebooks file: 757014471f

title category 2.826 sec
title subcategory 4.357 sec
row: 290 item: 33 link: https://github.com/InsightSoftwareConsortium/SimpleITK-Notebooks file: 8e07512583080008bbfbbad7595123f5.json
row: 291 item: 33 link: https://github.com/codingforentrepreneurs/Notebooks file: 8ed65267940acc24b073cf0b9fee9af9.json
row: 292 item: 33 link: https://github.com/Yorko/python_intro file: 8ef7fc02e4c7f514177949953c0387a2.json
row: 293 item: 33 link: https://github.com/willb/fraud-notebooks file: 9013a17f0df52aa58452a981891e540b.json
categorize
t5 skipped
nltk skipped
title category 2.516 sec
title subcategory 3.535 sec
row: 294 item: 34 link: https://github.com/ageron/handson-ml file: 9094c555ec82851cbd778860a314c1fd.json
row: 295 item: 34 link: https://github.com/hardmaru/pytorch_notebooks file: 90e6d9f3028284ef22a32035b4df8fc9.json
row: 296 item: 34 link: https://github.com/aws/amazon-sagemaker-examples file: 90f8151eacad20743aecb34c18fc39f9.json
row: 297 item: 34 link: https://github.com/root-project

title category 2.082 sec
title subcategory 2.923 sec
row: 357 item: 38 link: https://github.com/markjay4k/YOLO-series file: adec7451694e636aad57b37956f5b6aa.json
row: 358 item: 38 link: https://github.com/enakai00/jupyter_tfbook file: adff23612e7fb12abb018357044a0158.json
row: 359 item: 38 link: https://github.com/Anaconda-Platform/nbpresent file: aeb2a50579c6786141be38fec523849f.json
row: 360 item: 38 link: https://github.com/ageron/handson-ml2 file: aefc5754a57db8bc1f0d34890cb3e6a4.json
row: 361 item: 38 link: https://github.com/Tianxiaomo/tensorflow_notebook file: af4afec85e283d5a715642a122976a4c.json
row: 362 item: 38 link: https://github.com/jbwhit/berkeley-jupyter-notebook file: afabc8f37d6afe61abacb780dd926267.json
row: 363 item: 38 link: https://github.com/dask/old-dask-examples file: afc949244b908fca60b166fa4f9442fa.json
row: 364 item: 38 link: https://github.com/gpuopenanalytics/demo-docker file: afcdfd15c4279201afb96984a660b00b.json
row: 365 item: 38 link: https://github.com

title category 3.363 sec
title subcategory 4.577 sec
row: 429 item: 47 link: https://github.com/park-python/course file: cf2bb2ee11f5fd13c3cf5638b227f9b6.json
row: 430 item: 47 link: https://github.com/DB2-Samples/db2jupyter file: cf57472340f7af4a6216c15678e50b28.json
row: 431 item: 47 link: https://github.com/danilobellini/notebooks file: d007a3b26354ab9fe19b24b2fc244949.json
row: 432 item: 47 link: https://github.com/bqplot/bqplot file: d0c121f787c0dd8d4bc52ef18d62d668.json
row: 433 item: 47 link: https://github.com/codeneuro/notebooks file: d10d3b71d6e9365216a46d976843e396.json
row: 434 item: 47 link: https://github.com/jessepisel/5minutesofpython file: d16b30de788ac7436278bb4e1837261b.json
row: 435 item: 47 link: https://github.com/ResearchComputing/xsede_2015 file: d17dd0db6e175322c29297cd941d7ba8.json
row: 436 item: 47 link: https://github.com/bokeh/bokeh-notebooks file: d188745a9bb31e6d812d1d2d50cbc4de.json
row: 437 item: 47 link: https://github.com/yourwanghao/CMUComputationalP

title category 2.334 sec
title subcategory 3.482 sec
500
row: 500 item: 54 link: https://github.com/mwitiderrick/stockprice file: ec4248e39932f4d0d63106162fb68acc.json
row: 501 item: 54 link: https://github.com/ChadFulton/tsa-notebooks file: ec61791228f0115d95e33508e76c5859.json
row: 502 item: 54 link: https://github.com/inkandswitch/livebook file: ed293a5cb4ac7e24a4eeca0af913ebff.json
row: 503 item: 54 link: https://github.com/UCIDataScienceInitiative/PredictiveModeling_withPython file: ef29bf72967bacee968c1e5680ff4036.json
row: 504 item: 54 link: https://github.com/pixiedust/pixiedust file: ef397f98126a2a10a3cfd440d365e44a.json
row: 505 item: 54 link: https://github.com/tritemio/jupyter_notebook_beginner_guide file: ef95781d0a3a5a952c9913c5c8d405a9.json
row: 506 item: 54 link: https://github.com/peterroelants/notebooks file: f056283a4fc257e4e8bdf0aadaa06cb7.json
row: 507 item: 54 link: https://github.com/ogrisel/notebooks file: f0a517b4b3ef04713c0bf519f3d935b7.json
categorize
t5 skip

row: 663 item: 57 link: https://github.com/ankitkariryaa/ambulanceSiteLocation file: 4716c5589ede7deb3e073baccb6b061c.json
row: 664 item: 57 link: https://github.com/kaumaron/Data_Science/ file: 48974e02c4860ce6d930bbf999d32f0c.json
row: 665 item: 57 link: https://github.com/RealRadOne/Gyani-The-Loan-Eligibility-Predictor file: 4959521f4c6de5797925d428ddcb6ade.json
row: 666 item: 57 link: https://github.com/pratishthakapoor/RetailReplenishement/ file: 4a337662677a5eaf263eb042f3038d16.json
row: 667 item: 57 link: https://github.com/sky-t/hack-or-emergency-response file: 4b515e0400df0b497c9dd2c6ee98ae13.json
row: 668 item: 57 link: https://github.com/DocVaughan/MCHE485---Mechanical-Vibrations file: 4b6fd80e28d158cc3a5893081413b486.json
row: 669 item: 57 link: https://github.com/rawillis98/alpaca file: 4b80cbd3255587a39412c53e52f8b763.json
row: 670 item: 57 link: https://github.com/aayushmudgal/Reducing-Manufacturing-Failures file: 4baf0b0babd88e6b684a47bc119c876a.json
row: 671 item: 57 l

row: 789 item: 57 link: https://github.com/davidmasse/US-supreme-court-prediction file: 9517ced35400a34581aea80e629f8e85.json
row: 790 item: 57 link: https://github.com/talmo/leap file: 9699371c66c058d7ec7fadad74006130.json
row: 791 item: 57 link: https://github.com/anki1909/Recruit-Restaurant-Visitor-Forecasting file: 96a609da924b8f21b7ae909ef2a1aa95.json
row: 792 item: 57 link: https://github.com/hep-lbdl/CaloGAN file: 96cc22b31d725200b3fd11c71d20e708.json
row: 793 item: 57 link: https://github.com/datadesk/lapd-crime-classification-analysis file: 96ed0b93ca121ec2cba66d031301aab5.json
row: 794 item: 57 link: https://github.com/IBM-DSE/CyberShop-Analytics file: 980de799311518374c7a9bcfa58cae5a.json
row: 795 item: 57 link: https://github.com/datakind/datadive-gates92y-proj3-form990 file: 9967a9915c38d2f4bded9ed04b7c1653.json
row: 796 item: 57 link: https://github.com/ritchie46/anaStruct file: 9a31ce95ad65d0d574b59c22f2c09cbd.json
row: 797 item: 57 link: https://github.com/everAspiring/

row: 914 item: 57 link: https://github.com/usnistgov/modelmeth file: e02ce276c9c1eb09b819d9dc05ce815a.json
row: 915 item: 57 link: https://github.com/HowardNTUST/Marketing-Data-Science-Application file: e034642ae0824e4321fa8f5b07514042.json
row: 916 item: 57 link: https://github.com/aeronetlab/emergency-mapping file: e068c45ba73ec0fb9c53cc9c66d431ea.json
row: 917 item: 57 link: https://github.com/DFS-UCU/UkrainianAgriculture file: e0ec8a154a7017bb28f101970f474fb4.json
row: 918 item: 57 link: https://github.com/BrianChevalier/StructPy file: e11792e8a355f7b3412f350215d1a942.json
row: 919 item: 57 link: https://github.com/apbecker/Systemic_Risk/ file: e29a8dcfac036da6e124c307af08694c.json
row: 920 item: 57 link: https://github.com/firmai/interactive-corporate-report file: e2a1abafc620c0b92bef2fadf95d6c66.json
row: 921 item: 57 link: https://github.com/SeanMcOwen/FinanceAndPython.com-Derivatives file: e2b96c1898877c55f24a9987f7c520b7.json
row: 922 item: 57 link: https://github.com/GirrajMa

row: 1052 item: 57 link: https://www.kaggle.com/c/open-images-instance-segmentation-rvc-2020 file: 62583876bc0eb7e0d34a6d2d1018172d.json
row: 1053 item: 57 link: https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification file: 63d873fdd56b7f2e698a175c827fd29b.json
row: 1054 item: 57 link: https://www.kaggle.com/c/home-credit-default-risk file: 63dbe85abb8e2ca0f27775a2dcf5b4e1.json
row: 1055 item: 57 link: https://www.kaggle.com/c/alaska2-image-steganalysis file: 647c8d2facd6bc052331da47849e6ed2.json
row: 1056 item: 57 link: https://www.kaggle.com/c/painter-by-numbers file: 66862c6ca51e3094a8aa11c5247a3191.json
row: 1057 item: 57 link: https://www.kaggle.com/c/LANL-Earthquake-Prediction file: 671485ce26453d7870c3d406d6815012.json
row: 1058 item: 57 link: https://www.kaggle.com/c/denoising-dirty-documents file: 6718bb136e941eda0e6f5e1e0e28c78e.json
row: 1059 item: 57 link: https://www.kaggle.com/c/bosch-production-line-performance file: 67874b819ab11f4f0ba26905260187a2.jso

row: 1176 item: 57 link: https://www.kaggle.com/c/restaurant-revenue-prediction file: ef5ed556cf64908234d405471b70ca33.json
row: 1177 item: 57 link: https://www.kaggle.com/c/womens-machine-learning-competition-2018 file: ef6010511182af0336efb2a681b12f8b.json
row: 1178 item: 57 link: https://www.kaggle.com/c/google-cloud-ncaa-march-madness-2020-division-1-womens-tournament file: efd157a4ff023052e0c318e14dbacfde.json
row: 1179 item: 57 link: https://www.kaggle.com/c/DontGetKicked file: f056238e96d963049c3492cdf4de8f8d.json
row: 1180 item: 57 link: https://www.kaggle.com/c/vsb-power-line-fault-detection file: f0f6a5def6dd3e2f9b15bd943d54ed29.json
row: 1181 item: 57 link: https://www.kaggle.com/c/abstraction-and-reasoning-challenge file: f1b8d78bbe5c3441abb14e95f265c987.json
row: 1182 item: 57 link: https://www.kaggle.com/c/m5-forecasting-accuracy file: f26016d0466158fed9010f0e16afddfb.json
row: 1183 item: 57 link: https://www.kaggle.com/c/rsna-str-pulmonary-embolism-detection file: f279d3

row: 1310 item: 57 link: https://www.kaggle.com/hyeonho/imaterialist-fashion-2019-at-fgvc6-eda file: 0e73ebe2b26a58e2ee6d1c5edf8310a5.json
row: 1311 item: 57 link: https://www.kaggle.com/hsinwenchang/randomforestclassifier file: 0ea49c8a1fdaf507871f96461178b515.json
row: 1312 item: 57 link: https://www.kaggle.com/ratan123/march-madness-2020-ncaam-simple-lightgbm-on-kfold file: 0eaf4a69aef8211de421a403704c94e6.json
row: 1313 item: 57 link: https://www.kaggle.com/hli2020/googld-ai-visual-relationship-data-exploration file: 0ec7114b934fccce8cf31a69b1d9431a.json
row: 1314 item: 57 link: https://www.kaggle.com/parulpandey/decoding-march-madness file: 0eccd82fcf85bb622d0553512554633a.json
row: 1315 item: 57 link: https://www.kaggle.com/shaman89/yandex-praktikum-pytorch-train-baseline-lb-0-699 file: 0ece75cc8039e7cea6e5b60092d77a0f.json
row: 1316 item: 57 link: https://www.kaggle.com/tanumoynandy/diabeticretinopathyvgg16-finetuning file: 0f19737b08b3a078c38042a9b156a02a.json
row: 1317 item: 5

row: 1440 item: 57 link: https://www.kaggle.com/juliaelliott/basic-starter-kernel-ncaa-men-s-dataset file: 1d154df00493747d390ae8952335800a.json
row: 1441 item: 57 link: https://www.kaggle.com/rejasupotaro/let-s-cook-model file: 1d30026b0d203bf0bb6bd56d7fa35112.json
row: 1442 item: 57 link: https://www.kaggle.com/gomezp/complete-beginner-s-guide-eda-keras-lb-0-93 file: 1d5a4a583fb2fee2e1c5ca5199557b6e.json
row: 1443 item: 57 link: https://www.kaggle.com/wti200/exploratory-analysis-nyc-taxi-trip file: 1d5fc34e82acf48a01ceb7ea4351822b.json
row: 1444 item: 57 link: https://www.kaggle.com/xhlulu/recursion-2-headed-efficientnet-2-stage-training file: 1d71402540385389f789609aafd33503.json
row: 1445 item: 57 link: https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets file: 1d7a95eb2891a84d0f3b47e8beacaef1.json
row: 1446 item: 57 link: https://www.kaggle.com/junkoda/finding-a-pattern-in-difficulty-2 file: 1d8f8f77cddd8a7fd1d941253f232186.json
row: 1447 item: 57 link: http

row: 1572 item: 57 link: https://www.kaggle.com/erikbruin/careervillage-org-data-exploration file: 307f5dd39876dc0de9d584086cf737ec.json
row: 1573 item: 57 link: https://www.kaggle.com/mshaked/women-ballers file: 309a69851e0631998148077ae97ef8d9.json
row: 1574 item: 57 link: https://www.kaggle.com/cdeotte/fun-data-animation file: 30aa30cb4833252acf9190624c3f5d31.json
row: 1575 item: 57 link: https://www.kaggle.com/shivamb/3-impact-of-ctl-content-tone-language-cola file: 30c4a13dbeaaa8e2aab6abb41b23c465.json
row: 1576 item: 57 link: https://www.kaggle.com/stalkermustang/converting-lyft-dataset-to-kitty-format file: 30d32aff5567f53ce0ea1856960a9ace.json
row: 1577 item: 57 link: https://www.kaggle.com/ncchen/recurrence-relation file: 30f554b75d6d5a9b10ea22c45a99ed10.json
row: 1578 item: 57 link: https://www.kaggle.com/divsinha/sentiment-analysis-countvectorizer-tf-idf file: 3104baf2fb125aa69d13d22413bab124.json
row: 1579 item: 57 link: https://www.kaggle.com/cdeotte/pseudo-labeling-qda-0-

row: 1711 item: 57 link: https://www.kaggle.com/cdeotte/dog-autoencoder file: 417317d6e319fc58f53eb6a248a5a9e2.json
row: 1712 item: 57 link: https://www.kaggle.com/kaushal2896/kannada-mnist-using-cnn file: 41aa7bb029c174ff1c74f38627f12128.json
row: 1713 item: 57 link: https://www.kaggle.com/ashishpatel26/bird-eye-view-of-two-sigma-nn-approach file: 41b5b771aca93a3f6a43c700c1926f15.json
row: 1714 item: 57 link: https://www.kaggle.com/kneroma/m5-forecast-v2-python file: 41dd1ab94cec26da3a160d2db6b8eecc.json
row: 1715 item: 57 link: https://www.kaggle.com/hsinwenchang/lgbm-parameter-tuning file: 41e5d6cc378869adddcf102a96228fa4.json
row: 1716 item: 57 link: https://www.kaggle.com/akasharidas/plant-pathology-2020-in-pytorch file: 420300ac3cc8cf54797bd53add0a0acb.json
row: 1717 item: 57 link: https://www.kaggle.com/jonathanbesomi/private-test-not-that-private-afterall file: 4216c8364cfa37a7c95d0ce799246f1c.json
row: 1718 item: 57 link: https://www.kaggle.com/tolgadincer/landmark-recognition

row: 1845 item: 57 link: https://www.kaggle.com/saitanya/audio-recognition file: 4efd0fe4587bf16a56b17d9edddcffee.json
row: 1846 item: 57 link: https://www.kaggle.com/dimitreoliveira/cloud-segmentation-with-utility-scripts-and-keras file: 4f382c9f7ecb4d4c57f0198b5226086d.json
row: 1847 item: 57 link: https://www.kaggle.com/xhlulu/densenet-transfer-learning-iwildcam-2019 file: 4f592410cfa1e8fa682826d0878824a1.json
row: 1848 item: 57 link: https://www.kaggle.com/kaiska/facial-recognition-competition-using-fastai file: 4f8d235a045651d777091f0ae4a53200.json
row: 1849 item: 57 link: https://www.kaggle.com/mrtroll/analying-tfidf-biology-corpus file: 4faa269cf2e2be68cdf429822d694314.json
row: 1850 item: 57 link: https://www.kaggle.com/lbronchal/without-breaking-ciphers-0-48-lb file: 4fba14ab0c7f7e7394fa0cadc3cd71cb.json
row: 1851 item: 57 link: https://www.kaggle.com/agehsbarg/audio-challenge-cnn-with-concatenated-inputs file: 501de758c6f07249ad385f7eba6cf5ea.json
row: 1852 item: 57 link: htt

row: 1984 item: 57 link: https://www.kaggle.com/robotdreams/one-cycle-policy-with-keras file: 61573cc3cd1775b57247543a8c8c3dbc.json
row: 1985 item: 57 link: https://www.kaggle.com/ateplyuk/pytorch-efficientnet file: 6167c484c9dc86fb267e15d984479dd9.json
row: 1986 item: 57 link: https://www.kaggle.com/achalshah/allstate-feature-analysis-python file: 616e28d1e94a91d9a959734d35d1b982.json
row: 1987 item: 57 link: https://www.kaggle.com/gdoteof/pytorch-bert-baseline-wd-epochs-cnn-lstm file: 617f531e0d2ace43e9a890f9d49f2588.json
row: 1988 item: 57 link: https://www.kaggle.com/viveksrinivasan/eda-ensemble-model-top-10-percentile file: 61842c0496ca81fa1d38ecabadfdd086.json
row: 1989 item: 57 link: https://www.kaggle.com/pestipeti/bengali-quick-eda file: 61923647bc6adfd0fb971264dc55978b.json
row: 1990 item: 57 link: https://www.kaggle.com/leighplt/pytorch-tutorial-dataset-data-preparetion-stage file: 619c10efd745c6cb689bf0269dca277b.json
row: 1991 item: 57 link: https://www.kaggle.com/mathorma

row: 2115 item: 57 link: https://www.kaggle.com/theoviel/starter-code-eda-and-lgbm-baseline file: 715d8a7fb46ed18726698ae09037883b.json
row: 2116 item: 57 link: https://www.kaggle.com/den3b81/better-predictions-stacking-with-votingclassifier file: 718225155940480796f9341349e924fe.json
row: 2117 item: 57 link: https://www.kaggle.com/jionie/tta-power-densenet169 file: 71838b6ee070682c956e8179465207e1.json
row: 2118 item: 57 link: https://www.kaggle.com/mm5631/ml-workflow-data-science-approach file: 71cc3151d3441755722da87a236d2c60.json
row: 2119 item: 57 link: https://www.kaggle.com/ogrellier/using-classification-for-predictions file: 71dc7ff490fe9ba1baa733e15b791eda.json
row: 2120 item: 57 link: https://www.kaggle.com/muhammadumerbhutta/facial-recognition-competition-using-fastai file: 71ff5628c8f60a7f2b7310f5733a2c0d.json
row: 2121 item: 57 link: https://www.kaggle.com/rohanrao/birdcall-eda-chirp-hoot-and-flutter file: 7216279f7a3114e6ade049aa0dae61f8.json
row: 2122 item: 57 link: http

row: 2238 item: 57 link: https://www.kaggle.com/kmader/file-features-based-submission file: 80deeae6ed35aa2bc9cdb982f9f9bcdb.json
row: 2239 item: 57 link: https://www.kaggle.com/corochann/google-quest-first-data-introduction file: 810cd496a3f7670c4fcc00266d9afdb6.json
row: 2240 item: 57 link: https://www.kaggle.com/dktalaicha/covid-19-forecasting-week-4-arima file: 81145bd67a0dccc5455efaa77d431704.json
row: 2241 item: 57 link: https://www.kaggle.com/shivamb/dataset-decomposition-techniques file: 812c2441beb5bd0e44689cc12c7bcbde.json
row: 2242 item: 57 link: https://www.kaggle.com/simakov/keras-multilabel-neural-network-v1-2 file: 8183e05018716ed3ff7eff49aa006028.json
row: 2243 item: 57 link: https://www.kaggle.com/kmader/vgg16-u-net-on-carvana file: 81a992aef0fc98f840c98d3d3bd7e967.json
row: 2244 item: 57 link: https://www.kaggle.com/onlyshadow/a-practical-guide-to-ny-taxi-data-0-379 file: 81cc02bc856769af98013b185da37e52.json
row: 2245 item: 57 link: https://www.kaggle.com/akensert/rs

row: 2376 item: 57 link: https://www.kaggle.com/jesucristo/memorizer-cgan-for-dummies file: 91dcba8fec2533f9b4a7f83040a83009.json
row: 2377 item: 57 link: https://www.kaggle.com/willkoehrsen/featuretools-for-good file: 91e4209d122f3174cfb77e77539b3ea4.json
row: 2378 item: 57 link: https://www.kaggle.com/kmader/spectrogram-classifier-mobilenet file: 91ebf96ba42d5e7182ec6c938de8d001.json
row: 2379 item: 57 link: https://www.kaggle.com/mobassir/jigsaw-google-q-a-eda file: 91efda28a8462c6b5844334663eb1474.json
row: 2380 item: 57 link: https://www.kaggle.com/mjbahmani/statistical-analysis-for-elo file: 91f54758a8ef63aad24aa8401c62b165.json
row: 2381 item: 57 link: https://www.kaggle.com/artgor/even-more-features file: 920c79e7c3a4092dc75b9c6d1ac92364.json
row: 2382 item: 57 link: https://www.kaggle.com/benanakca/kannada-mnist-cnn-tutorial-with-app-top-2 file: 92266a97ee633e016bce7abf8e7f1636.json
row: 2383 item: 57 link: https://www.kaggle.com/batzner/gini-coefficient-an-intuitive-explanati

row: 2514 item: 57 link: https://www.kaggle.com/kplauritzen/elo-ratings-in-python file: a1a8f60052fba8cc9b4494d647c8ea71.json
row: 2515 item: 57 link: https://www.kaggle.com/marketneutral/eda-what-does-mktres-mean file: a1bed3df1d495f503e1b7450850cb5c9.json
row: 2516 item: 57 link: https://www.kaggle.com/vijaybj/basic-u-net-using-tensorflow file: a1cc56fb869c399caa4a1e418188eceb.json
row: 2517 item: 57 link: https://www.kaggle.com/shivamb/objects-bounding-boxes-using-resnet50-imageai file: a1ccb09e87c2731aae66e2bb7dd85927.json
row: 2518 item: 57 link: https://www.kaggle.com/xhlulu/dsb-2019-simple-lgbm-using-aggregated-data file: a207270b4f31671617b712d4e2018140.json
row: 2519 item: 57 link: https://www.kaggle.com/ratan123/m5-forecasting-lightgbm-with-timeseries-splits file: a20935cd799a19f9051b5a3eb0370c72.json
row: 2520 item: 57 link: https://www.kaggle.com/joseleiva/massey-s-ordinal-s-ordinals file: a20d0725c89b5b178f8320ac0b8b9429.json
row: 2521 item: 57 link: https://www.kaggle.com

row: 2649 item: 57 link: https://www.kaggle.com/osciiart/understanding-lwlrap-sorry-it-s-in-japanese file: b23e5a169c540ab387e54f2c215cae20.json
row: 2650 item: 57 link: https://www.kaggle.com/gidutz/starter-kernel-recursion-pharmaceuticals file: b247574de4f2941c88f037b2915a8232.json
row: 2651 item: 57 link: https://www.kaggle.com/khoongweihao/autonomous-driving-epoch-on-fire-resnext50 file: b24783248cf023d6573e13f03e1b3c4d.json
row: 2652 item: 57 link: https://www.kaggle.com/paultimothymooney/explore-image-metadata-s5p-gfs-gldas file: b253d605593e78605491d55686d64185.json
row: 2653 item: 57 link: https://www.kaggle.com/msheriey/flowers-on-tpu-ensemble-lr-schedule file: b265f59303fc63f4963cb120a0b9c692.json
row: 2654 item: 57 link: https://www.kaggle.com/quadmx08/monsters-first-submission file: b2670151fe54f55a2b20e348adb47da9.json
row: 2655 item: 57 link: https://www.kaggle.com/theoviel/winner-winner-turkey-dinner-lb-0-990 file: b290d568bf89948303d41a49e6465424.json
row: 2656 item: 57

row: 2785 item: 57 link: https://www.kaggle.com/kashnitsky/simple-logistic-regression-bert-0-27-lb file: c4628549ae389b20f75acfe90974d76e.json
row: 2786 item: 57 link: https://www.kaggle.com/robikscube/autonomous-driving-introduction-data-review file: c4720728e28b1adf1c709c25233fc929.json
row: 2787 item: 57 link: https://www.kaggle.com/aiswaryaramachandran/eda-and-feature-engineering file: c4aeae95c7965baff563b766e89369a2.json
row: 2788 item: 57 link: https://www.kaggle.com/maheshdadhich/i-will-sell-everything-for-free-0-55 file: c4beb2f9a26e68abcd744d8adda0a4ff.json
row: 2789 item: 57 link: https://www.kaggle.com/cdeotte/support-vector-machine-0-925 file: c4c15992ad87d00c87177cd5cb4d6009.json
row: 2790 item: 57 link: https://www.kaggle.com/shreeharjoshi/classifying-the-type-of-forest file: c50531b864db665e7447dcc5be366c86.json
row: 2791 item: 57 link: https://www.kaggle.com/khandelwallaksya/eda-and-generation file: c50d7f16b46912dbb264d0e65b711a0c.json
row: 2792 item: 57 link: https:/

row: 2919 item: 57 link: https://www.kaggle.com/mahmoud86/eda-rdf-boosting-kn file: d37e6278d3d2a60f4b650d524839501e.json
row: 2920 item: 57 link: https://www.kaggle.com/sudhirnl7/simple-electron-volt-predictor file: d382e7dc1dc036180a3cfe06ba3ab3ae.json
row: 2921 item: 57 link: https://www.kaggle.com/ironben/rdbs-to-graphdb-neo4j-network-approach file: d3999c6848f656e52cf83b772614a3c6.json
row: 2922 item: 57 link: https://www.kaggle.com/nguyenhoa/dog-cat-classifier-gradcam-with-tensorflow-2-0 file: d399a657e096b32d132872ec23847bd9.json
row: 2923 item: 57 link: https://www.kaggle.com/kool777/lyft-level5-eda-training-inference file: d39de44aaf5e62f49f64adf24f7894d2.json
row: 2924 item: 57 link: https://www.kaggle.com/jeru666/zillow-revamped-with-memory-reduction file: d3dd637831e1ad8b77dc1eb31d048325.json
row: 2925 item: 57 link: https://www.kaggle.com/karanjakhar/facial-keypoint-detection file: d401917ee033b44a440bcd11a3b7fc2a.json
row: 2926 item: 57 link: https://www.kaggle.com/redwan

row: 3056 item: 57 link: https://www.kaggle.com/ynue21/random-act-of-pizza file: e35dd26eaca484a18e0e5622a846edab.json
row: 3057 item: 57 link: https://www.kaggle.com/mayukh18/dataset-exploration-and-simple-on-the-fly-training file: e383cbf2646ca57f7f622ec79268663d.json
row: 3058 item: 57 link: https://www.kaggle.com/bobcz3/exploring-bnp-data-distributions file: e3b2b594ccb3718c1193958c378b1ef9.json
row: 3059 item: 57 link: https://www.kaggle.com/kylehounslow/a-method-for-finding-leaked-images-in-test-set file: e3b9bc28e02454f9d62e2acbd7136b77.json
row: 3060 item: 57 link: https://www.kaggle.com/orpitadas/amazon-catboost-shap-hyperopt-90-auc-silvermedal file: e3e2add74362dd9b66e27bbcce5ae289.json
row: 3061 item: 57 link: https://www.kaggle.com/kmader/nuclei-overview-to-submission file: e3e66831c8a04f2566ce6bb9c3b5e14a.json
row: 3062 item: 57 link: https://www.kaggle.com/davidthaler/duplicates-of-duplicates file: e40a02ec033674e26caeb6d2c5064a9e.json
row: 3063 item: 57 link: https://www

row: 3189 item: 57 link: https://www.kaggle.com/bibuml/beating-cvxtz-very-good-code-0-38-to-0-42 file: f4394152c1d80586412805fd262d9497.json
row: 3190 item: 57 link: https://www.kaggle.com/robertsturrock/prediction-using-fivethirtyeight-elo-ratings file: f4399cdfe4daabfd98ba2a3c5e971da4.json
row: 3191 item: 57 link: https://www.kaggle.com/davidmezzetti/trec-covid-submission file: f4429ee71e1c98f06bdf9e7a93bf8f0d.json
row: 3192 item: 57 link: https://www.kaggle.com/zinovadr/nfl-tableau-analysis file: f496472e172929d0036e0d6d0186e6fa.json
row: 3193 item: 57 link: https://www.kaggle.com/patilpramod2157/transfer-learning-using-inception-resnet-v2 file: f4c05d6b4fec14a0c6acc938b6d5fdf3.json
row: 3194 item: 57 link: https://www.kaggle.com/hiralmshah/robot-sensor-eda-fe-and-prediction-improvement file: f4c558fd0a91f70fc2d2b3d9027e13a3.json
row: 3195 item: 57 link: https://www.kaggle.com/dferhadi/covid-19-predictions-growth-factor-and-calculus file: f520673f28ed3ba218f106ca75bf65d1.json
row: 3

row: 3324 item: 57 link: https://mlart.co/item/compose-violin-music-with-prompts-trained-with-a-combination-of-a-cnn-and-rnn_-and-performed file: 1dc09112b4378a773aa3305cb90f6c70.json
row: 3325 item: 57 link: https://mlart.co/item/using-a-gan-to-generate-architecture-plans file: 1dd40c1440fd5a16962c9a9770b6acea.json
row: 3326 item: 57 link: https://mlart.co/item/generate-lyrics-with-an-vae-trained-with-seven-different-styles file: 1ee5ccf71bb506dfcb73f0483ed331cc.json
row: 3327 item: 57 link: https://mlart.co/item/classifies-drawings-and-matches-them-with-professional-graphics file: 1f16868747c36ca1e5172e6cd5c216f0.json
row: 3328 item: 57 link: https://mlart.co/item/finger-pressure-sensors_-a-breath-pressure-and-embouchure-sensor-on-the-mouthpiece-mapped-with-an-ann-to-generate-sound-via-a-microcontroller file: 21082913c712832481061a513ba6dd8b.json
row: 3329 item: 57 link: https://mlart.co/item/a-collaborative-art-tool-for-mixing-the-latent-space-of-different-gan-models-to-produce-new-

row: 3458 item: 57 link: https://mlart.co/item/train-a-gan-on-nike-shoes_-generate-an-image_-and-produce-a-shoe-collection-according-to-the-generated-image file: 7b6e139722b49806abaa0115a464a857.json
row: 3459 item: 57 link: https://mlart.co/item/tom-cruise-deep-fake file: 7ba50ca6e84aaaebdf6fbe44aab74fa5.json
row: 3460 item: 57 link: https://mlart.co/item/visualize-the-latent-representation-of-a-grandmother-in-a-cnn file: 7c26c833cf6c4423b654552c27067079.json
row: 3461 item: 57 link: https://mlart.co/item/visualize-the-entire-imagenet-dataset-in-vertical-rows file: 7c28c16474d28f172b40e5338dbe195a.json
row: 3462 item: 57 link: https://mlart.co/item/a-computer-looks-at-itself-with-a-camera-and-translates-image-features-into-text-with-an-rnn file: 7c7e9cbfe72151ba794c06cf30e3ed6d.json
row: 3463 item: 57 link: https://mlart.co/item/gan-generated-japanese-wood-prints_-and-then-print-them-using-the-same-technique file: 7c8c27789fb46affb51c418198667564.json
row: 3464 item: 57 link: https://

row: 3590 item: 57 link: https://mlart.co/item/translate-online-maps-to-19th-century-maps-in-real-time-with-cyclegan file: d4cb44570a85b80ec7b17dd71543f73f.json
row: 3591 item: 57 link: https://mlart.co/item/stylegan-interpolation_-trained-on-karl-blossfeldt_s-herbarium-_1928 file: d5e358e4f01cb820de380d3d73740c63.json
row: 3592 item: 57 link: https://mlart.co/item/rnn-generated-fake-articles-seeded-by-real-world-article-titles file: d6ac2fc6448decbd3ca6c8241998505c.json
row: 3593 item: 57 link: https://mlart.co/item/visualize-the-layers-of-a-cnn-with-your-camera file: d87d2b9196fc0fbe8ade924a065208cf.json
row: 3594 item: 57 link: https://mlart.co/item/apply-cyclegan-to-a-sketch-of-the-fonthill-castle file: d8816f879487a1e32bdea87dec5eb904.json
row: 3595 item: 57 link: https://mlart.co/item/generate-chairs-with-a-gan-and-then-produce-the-generated-chairs file: d8ad874798bbc895ef84fb5d04543aa6.json
row: 3596 item: 57 link: https://mlart.co/item/a-stylegan-trained-on-eboy_s-database-of-p

row: 3712 item: 57 link: https://thecleverprogrammer.com/2020/05/23/data-science-project-book-recommendation-system-with-machine-learning/ file: 57a4eca27828c3e02d5f3bd071d10105.json
row: 3713 item: 57 link: https://thecleverprogrammer.com/2020/11/12/earthquake-prediction-model-with-machine-learning/ file: 584b19de20fa8800ca1e2a06f3d07d1a.json
row: 3714 item: 57 link: https://thecleverprogrammer.com/2020/12/01/keyword-extraction-with-python/ file: 59846dca6a174590f8178aa552943f49.json
row: 3715 item: 57 link: https://thecleverprogrammer.com/2020/06/12/sms-spam-detection-with-machine-learning/ file: 59e85ba018cc2bc0e2453fa16d94c684.json
row: 3716 item: 57 link: https://thecleverprogrammer.com/2020/09/06/lung-segmentation-with-machine-learning/ file: 5a47f5158b0d0755932c0931ca8fd23c.json
row: 3717 item: 57 link: https://thecleverprogrammer.com/2020/08/30/image-classification-with-tensorflow-in-machine-learning/ file: 5a7eea5788be2ef6e7c4aa69e5cf5c5c.json
row: 3718 item: 57 link: https://