# Yugioh TCG Complete Card Database
This notebook will perform web scraping to load all available cards and their sets from scratch. This is divided into 2 main processing tasks and an additional merge from 3rd party sources for the collectibles market. The first 2 tasks are done in this notebook.

  1. **Indexing**: This will index every unique card as it relates to the trading card game which official rules will treat identically
  2. **Expanding**: Each unique card has multiple rarities, releases, and sets and this task will find them all.
  3. **Merging**: Once we have the data from official sources, we can merge data from 3rd parties like TCGPlayer to understand things like the price.

## Rules
The code is meant to not only be compliant but respectful to the web servers that have the data. The scraper will not only follow robots.txt but also optimize further. In the Appendix, there's code that provides an aggressive way that is literally 1000x faster but will overwhelm the data sources.

In [1]:
import requests
from bs4 import BeautifulSoup
import datetime
print(requests.get('https://tcgplayer.com/robots.txt').text)

User-agent: *
Crawl-Delay: 10
Allow: /
Disallow: /*?*seller=*
Disallow: /login
Disallow: /search/articles
Disallow: /content/magic-the-gathering/deck/
Disallow: /content/disney-lorcana/deck/
Disallow: /content/yugioh/deck/
Disallow: /content/pokemon/deck/
Disallow: /content/flesh-and-blood/deck/
Sitemap: https://www.tcgplayer.com/sitemap/index.xml



In [2]:
# 404 file not found response, no safeguards
requests.get('https://www.yugioh-card.com/robots.txt')

<Response [404]>

## 1. Index Configuration & Example
Configuration details including the URL's that need to be predefined are done here. A basic example is provided for understanding and to make sure it will work in the environment. Then, we loop through all cards based on the Example.

In [3]:
# this is the card search from the front end where we are going to start loading the cards
import utils
config = {
    'base-url' : 'https://www.db.yugioh-card.com/yugiohdb/card_search.action?ope=1&sess=1&rp=10&mode=&sort=1&keyword=&stype=1&ctype=&othercon=2&starfr=&starto=&pscalefr=&pscaleto=&linkmarkerfr=&linkmarkerto=&link_m=2&atkfr=&atkto=&deffr=&defto=&releaseDStart=1&releaseMStart=1&releaseYStart=1999&releaseDEnd=&releaseMEnd=&releaseYEnd=&page=',
    'index-url' : 'https://www.db.yugioh-card.com/yugiohdb/card_search.action?ope=2&cid=',
    'link-monster-img': 'https://www.db.yugioh-card.com/yugiohdb/external/image/parts/link_pc/link{P}.png',
    'name' : f"yugioh-index-expand-{datetime.datetime.now().strftime('%Y%b%d-%H%M%S').upper()}"
}

In [4]:
config['name'] = f"yugioh-index-expand-{utils.timestamp()}"
config['name']

'yugioh-index-expand-2025SEP05-111811'

In [5]:
example = requests.get(config['base-url'])

In [6]:
soup = BeautifulSoup(example.text, 'html.parser')

In [7]:
soup


<!-- ///CardResultNormal  -->
<!DOCTYPE html>

<html>
<head>
<!-- ///MetaCardResultNormal  -->
<!-- ///MetaComon  -->
<meta content="IE=10" http-equiv="x-ua-compatible"/>
<meta content="IE=EmulateIE10" http-equiv="x-ua-compatible"/>
<meta content="text/html; charset=utf-8" http-equiv="content-type"/>
<meta content="en" http-equiv="Content-Language"/>
<meta content="text/css" http-equiv="Content-Style-Type"/>
<meta content="text/javascript" http-equiv="Content-Script-Type"/>
<meta content="no" http-equiv="imagetoolbar"/>
<meta content="telephone=no" name="format-detection"/>
<meta content="Yu-Gi-Oh! Neuron(TRADING CARD GAME CARD DATABASE)" property="og:site_name"/>
<meta content="http://www.db.yugioh-card.com/sns/logo_ocg_f.png" property="og:image">
<meta content="product" property="og:type">
<meta content="en_US" property="og:locale">
<meta content="summary" name="twitter:card">
<meta content="http://www.db.yugioh-card.com/sns/logo_ocg_t.png" name="twitter:image">
<script src="/yugioh

In [8]:
hand = soup.find_all('div', class_ = 't_row')

### Scrape Index
The Yugioh card index from Konami is paginated with 10 cards in each page. We will go through and parse all cards on each page. The following functions will process the page, or `hand`, and add them to results

In [9]:
def transform_hand(hand):
    '''
    Takes in the HTML row and returns an appropriate dictionary

    Description: Description of the card's effects or abilities.
    Name: The card's name.
    Attribute: The attribute of the monster (e.g., LIGHT, DARK).
    Level/Rank: item_box_valueLevel for normal monsters or rank for Xyz monsters.
    Attack: Attack points of the monster.
    Defense: Defense points of the monster.
    Type: Card type (e.g., Monster, Spell, Trap).
    SubType: Specific type within the card type (e.g., Dragon, Warrior).
    '''
    results = []
    for card in hand:
        soup = BeautifulSoup(str(card), 'html.parser')
        # these are fields shared by all cards
        
        result = {
            'index': soup.find('input', {'class': 'cid'})['value'],
            'name': soup.find('span', {'class': 'card_name'}).text.strip(),
            'description': soup.find('dd', {'class': 'box_card_text'}).text.strip(),
            'type': soup.find('span', {'class': 'box_card_attribute'}).text.strip()
        }

        # based on the type, each card can have different fields
        if result['type'] in ['SPELL', 'TRAP'] :
            sub_type = soup.find('span', {'class': 'box_card_effect'})
            result.update({
                'sub_type': sub_type.text.strip() if sub_type else '',
                'attribute': '',
                'rank': '',
                'attack': '',
                'defense': ''
            })
        else : # MONSTER card
            # check if link card
            link = soup.find('span', {'class': 'box_card_linkmarker'})
            if link :
                rank = soup.find('span', {'class': 'box_card_linkmarker'}).text.strip()
                links = soup.find('img', {'title': 'Link'})['src'].split('/')[-1].replace('.png', '').replace('link', '')
                rank = f'{rank} P{links}'
            else:
                rank = soup.find('span', {'class': 'box_card_level_rank'}).text.strip()
            result.update({
                'type': 'MONSTER',
                'attribute': result['type'],
                'sub_type': soup.find('span', {'class': 'card_info_species_and_other_item'}).text.replace('\r\n', '').replace('\t', ''),
                'rank': rank,
                'attack': soup.find('span', {'class': 'atk_power'}).text.strip().split()[-1],
                'defense': soup.find('span', {'class': 'def_power'}).text.strip().split()[-1]
            })
        
        results.append(result)
    
    return results

In [10]:
cards = transform_hand(hand)

In [11]:
page = 2 # we're on page 2 after example
hand = cards

In [12]:
# iterate through database and collect links to cards
while len(hand) > 0:
  print(page)
  data = None
  
  while data is None:
    try: # connect
      data = requests.get(config['base-url'] + str(page))
    except:
      print('trying again')
      pass
  
  soup = BeautifulSoup(data.text, 'html.parser')
  hand = transform_hand(soup.find_all('div', class_ = 't_row'))
  cards.extend(hand)
  page += 1

2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
27

In [13]:
len(cards) # this was 13327 on last run

13327

In [14]:
deck = cards

### Save Index
We will save out the index to a CSV file. Please make sure to rename it.

In [15]:
import pandas as pd

In [16]:
df = pd.DataFrame(deck)

In [17]:
df

Unnamed: 0,index,name,description,type,sub_type,attribute,rank,attack,defense
0,21385,"""A Case for K9""","When this card is activated: You can add 1 ""K9...",SPELL,Continuous,,,,
1,7128,"""A"" Cell Breeding Device","During each of your Standby Phases, put 1 A-Co...",SPELL,Continuous,,,,
2,7315,"""A"" Cell Incubator",Each time an A-Counter(s) is removed from play...,SPELL,Continuous,,,,
3,12653,"""A"" Cell Recombination Device",Target 1 face-up monster on the field; send 1 ...,SPELL,Quick-Play,,,,
4,6994,"""A"" Cell Scatter Burst","Select 1 face-up ""Alien"" monster you control. ...",SPELL,Quick-Play,,,,
...,...,...,...,...,...,...,...,...,...
13322,10844,ZW - Sleipnir Mail,"You can target 1 ""Utopia"" monster you control;...",MONSTER,[Beast／Effect],LIGHT,Level 4,1000,1000
13323,16419,ZW - Sylphid Wing,"You can only control 1 ""ZW - Sylphid Wing"". Yo...",MONSTER,[Beast／Effect],LIGHT,Level 4,800,1600
13324,10365,ZW - Tornado Bringer,"You can target 1 ""Utopia"" monster you control;...",MONSTER,[Dragon／Effect],WIND,Level 5,1300,1800
13325,10366,ZW - Ultimate Shield,When this card is Normal or Special Summoned: ...,MONSTER,[Aqua／Effect],EARTH,Level 4,0,2000


In [18]:
df.to_csv(f"{config['name']}.csv") # CSV to maintain open source compatibility

In [20]:
df = pd.read_csv(f"{config['name']}.csv")

## 2. Expand Configuration and Example
Each card has a variety of information that can be parsed from the detailed page. This includes release dates and sets the part was part of.

In [21]:
example = requests.get(config['index-url'] + str(df.iloc[456]['index']))

In [22]:
#example.text

In [23]:
deck = list(df['index'])
print(len(deck))
print(deck[:10])

13327
[21385, 7128, 7315, 12653, 6994, 18843, 15287, 15288, 15289, 11391]


In [24]:
with open('example.html', 'w+') as f :
    f.write(example.text)

### Expand Scrape
Based on the index parameters, we can now query the official Konami database to request all of the details available about the card. Because there is so much data, we will save out all details and then process it. The data is the same from the card website and each HTML file will be saved out into a database using SQLite.

In [25]:
import os
import sqlite3

# Connect to SQLite database with self-contained .db file
db_name = f"{config['name']}.db"
conn = sqlite3.connect(db_name)
cursor = conn.cursor()

# Create table if it doesn't exist
cursor.execute('''
CREATE TABLE IF NOT EXISTS html_files (
    ygo_index TEXT,
    valid TEXT,
    content TEXT
)
''')
# Get list of already completed indices with matching 'valid' value
completed = {row[0] for row in cursor.execute('SELECT ygo_index FROM html_files WHERE valid = ?', (config['name'],))}

for i, index in enumerate(deck) :
    print(f'\rCard {i} ID: {index}{" "*25}', end='')
    
    if str(index) in completed:
        print(f'Found {index} already with matching valid name, skipping.')
        continue
    
    response = None
    while (not response) :
        try:
            response = requests.get(config['index-url'] + str(index))
        except Exception as e:
            print(f'{e}\nRetrying')
            response = None
    if response.ok :
        try:
            # Insert HTML content into the database
            cursor.execute('INSERT INTO html_files (ygo_index, valid, content) VALUES (?, ?, ?)',
                           (index, config['name'], response.text))
            conn.commit()
            print(f'Successfully stored {index}.')
        except Exception as e:
            print(e)
            conn.close()
            break
    else:
        print(f'Failed to retrieve {index}.')


Card 0 ID: 21385                         Successfully stored 21385.
Card 1 ID: 7128                         Successfully stored 7128.
Card 2 ID: 7315                         Successfully stored 7315.
Card 3 ID: 12653                         Successfully stored 12653.
Card 4 ID: 6994                         Successfully stored 6994.
Card 5 ID: 18843                         Successfully stored 18843.
Card 6 ID: 15287                         Successfully stored 15287.
Card 7 ID: 15288                         Successfully stored 15288.
Card 8 ID: 15289                         Successfully stored 15289.
Card 9 ID: 11391                         Successfully stored 11391.
Card 10 ID: 6032                         Successfully stored 6032.
Card 11 ID: 5139                         Successfully stored 5139.
Card 12 ID: 6053                         Successfully stored 6053.
Card 13 ID: 4446                         Successfully stored 4446.
Card 14 ID: 4806                         Successfully stor

In [26]:
# Query to count rows with the same 'valid' value as config['name']
cursor.execute('SELECT COUNT(*) FROM html_files WHERE valid = ?', (config['name'],))
count = cursor.fetchone()[0]

# Output the result
print(f'There are currently {count} cards in the database for this run ({config['name']}).')
conn.close() # this was 13044 as of last run, new data indicates there is more

There are currently 13327 cards in the database for this run (yugioh-index-expand-2025SEP05-111811).


In [27]:
# define relevant columns
index_df = df
df = df[['index', 'name', 'description', 'type',
       'sub_type', 'attribute', 'rank', 'attack', 'defense']]

In [28]:
def get_html_data(card, valid = config['name']):
    conn = sqlite3.connect(db_name)
    cursor = conn.cursor()
    cursor.execute('SELECT * FROM html_files WHERE ygo_index = ? AND valid = ?', (index, valid))
    row = cursor.fetchone()
    
    if row:
        print(f'Card found: Index: {row[0]}, Valid: {row[1]}, Content length: {len(row[2])}')
    else:
        print(f'Card with index {index} and valid {valid} not found.')

    cursor.close()
    return row[2]

In [29]:
# test it out
print(get_html_data('15287')[:50], '. . . ')

Card found: Index: 9879, Valid: yugioh-index-expand-2025SEP05-111811, Content length: 146657


<!-- ///CardDetail  -->

<!DOCTYPE html>
<h . . . 


In [30]:
from bs4 import BeautifulSoup

cards = []
for i, index in enumerate(df['index']) :
    # extract card from database
    card_default = df[df['index'] == index].to_dict(orient='records')[0]
    # extract card details
    data = get_html_data(index)
    soup = BeautifulSoup(data, 'html.parser')
    sets = soup.find('div', {'id': 'update_list'}).find_all('div', class_='t_row')
    # loaded all sets, now parse
    if len(sets) < 1 :
        print('error , no sets found: ' + card['name'])
    for s in sets :
        card = card_default.copy()
        # gets set name, release, card set id
        set_soup = BeautifulSoup(str(s), 'html.parser')
        card.update({
            'set_name': set_soup.find('div', {'class': 'pack_name'}).text.strip(),
            'set_id': set_soup.find('div', {'class': 'card_number'}).text.strip(),
            'set_release': set_soup.find('div', {'class': 'time'}).text.strip(),
            'rarity': ' '.join(set_soup.find('div', {'class': 'lr_icon'}).text.split())
        })
        cards.append(card)
        print(f"\rGenerated {card['set_id']} {card['rarity']} {i+1} {len(df)} {' '*50}", end='')

Card found: Index: 21385, Valid: yugioh-index-expand-2025SEP05-111811, Content length: 146272
Generated JUSH-EN040 STAR Starlight Rare 1 13327                                                   Card found: Index: 7128, Valid: yugioh-index-expand-2025SEP05-111811, Content length: 143110
Generated FOTB-EN043 C Common 2 13327                                                   Card found: Index: 7315, Valid: yugioh-index-expand-2025SEP05-111811, Content length: 143049
Generated GLAS-EN062 C Common 3 13327                                                   Card found: Index: 12653, Valid: yugioh-index-expand-2025SEP05-111811, Content length: 142621
Generated INOV-EN063 C Common 4 13327                                                   Card found: Index: 6994, Valid: yugioh-index-expand-2025SEP05-111811, Content length: 142521
Generated STON-EN041 C Common 5 13327                                                   Card found: Index: 18843, Valid: yugioh-index-expand-2025SEP05-111811, Content len

In [31]:
len(cards)

40408

## Finalize Complete Card Database
We will now save out our expanded version but make sure to change the filename. It is considered complete in terms of official data sources. Additional data like market prices are merged but this is considered 3rd party.

In [32]:
full_df = pd.DataFrame(cards)
full_df['name'] = full_df['name'].apply(lambda x: x.strip())
full_df

Unnamed: 0,index,name,description,type,sub_type,attribute,rank,attack,defense,set_name,set_id,set_release,rarity
0,21385,"""A Case for K9""","When this card is activated: You can add 1 ""K9...",SPELL,Continuous,,,,,Justice Hunters,JUSH-EN040,2025-08-01,SR Super Rare
1,21385,"""A Case for K9""","When this card is activated: You can add 1 ""K9...",SPELL,Continuous,,,,,Justice Hunters,JUSH-EN040,2025-08-01,STAR Starlight Rare
2,7128,"""A"" Cell Breeding Device","During each of your Standby Phases, put 1 A-Co...",SPELL,Continuous,,,,,FORCE OF THE BREAKER,FOTB-EN043,2007-05-16,C Common
3,7315,"""A"" Cell Incubator",Each time an A-Counter(s) is removed from play...,SPELL,Continuous,,,,,GLADIATOR'S ASSAULT,GLAS-EN062,2007-11-14,C Common
4,12653,"""A"" Cell Recombination Device",Target 1 face-up monster on the field; send 1 ...,SPELL,Quick-Play,,,,,INVASION: VENGEANCE,INOV-EN063,2016-11-04,C Common
...,...,...,...,...,...,...,...,...,...,...,...,...,...
40403,10366,ZW - Ultimate Shield,When this card is Normal or Special Summoned: ...,MONSTER,[Aqua／Effect],EARTH,Level 4,0,2000,COSMO BLAZER,CBLZ-EN007,2013-01-25,C Common
40404,9879,ZW - Unicorn Spear,"You can target 1 ""Number C39: Utopia Ray"" you ...",MONSTER,[Beast／Effect],LIGHT,Level 4,1900,0,STAR PACK 2014,SP14-EN004,2014-02-21,C Common
40405,9879,ZW - Unicorn Spear,"You can target 1 ""Number C39: Utopia Ray"" you ...",MONSTER,[Beast／Effect],LIGHT,Level 4,1900,0,STAR PACK 2014,SP14-EN004,2014-02-21,ST Starfoil
40406,9879,ZW - Unicorn Spear,"You can target 1 ""Number C39: Utopia Ray"" you ...",MONSTER,[Beast／Effect],LIGHT,Level 4,1900,0,SUPER STARTER V FOR VICTORY,YS13-EN018,2013-06-14,C Common


In [33]:
full_df.to_csv(f'{config['name']}.csv')

## Appendix

```python
# reference https://www.scrapingbee.com/tutorials/make-concurrent-requests-in-python/
import concurrent.futures
import requests

MAX_RETRIES = 5 # Setting the maximum number of retries if we have failed requests to 5.
MAX_THREADS = 1000

def scrape(url):
    for _ in range(MAX_RETRIES):
        response = requests.get(config['index-url'] + url) # Scrape!

        if response.ok: # If we get a successful request
            print(index)
            index = url.split('=')[-1] #the cid parameter in the url
            with open(f'output/{index}.html', 'w+') as f:
                f.write(response.text)
            
        else: # If we get a failed request, then we continue the loop
            print(response.content)

with concurrent.futures.ThreadPoolExecutor(max_workers=MAX_THREADS) as executor:
    executor.map(scrape, list(set(deck)))

import os
outputs = os.listdir('output/')
print(len(outputs))
```