## Terraria Webscraper

The following code scrapes from the terraria wiki found [here](https://terraria.fandom.com/wiki/Item_IDs)

We gather the following data on each item:
 * Name of item (as referred to in game)
 * Image associated with item (as displayed in game)
 * Item ID
 * Type of Item (Requires some extra work to get)

In [1]:
import pandas as pd
from bs4 import BeautifulSoup
import requests

url = "https://terraria.fandom.com/wiki/Item_IDs"
data = requests.get(url)

soup = BeautifulSoup(data.text, 'html.parser')
table = soup.find('table', class_=['lined', 'sortable', 'jquery-tablesorter'])

results = []

if table:
    rows = table.find_all('tr')
    for row in rows[1:]:
        cols = row.find_all('td')
        # Store item id, name, and image link
        item_id = cols[0].text.strip()
        name = cols[2].text.strip()
        img = cols[1].find('img')

        # Case 1: Image has a gif and img associated with it, so we check data-src for actual static img url
        image_url = img.get('data-src') if img else None
        # Case 2: Image has no gif, so we just get img src tag
        if not image_url and img:
            image_url = img.get('src')

        # Also fetch each item's page link for future web scraping purposes
        link_element = cols[2].find('a')
        item_url = link_element.get('href') if link_element else None
        item_url = f'https://terraria.fandom.com{item_url}'

        results.append({'ID': item_id, 'Name': name, 'Image': image_url, 'Link': item_url})

for result in results[:5]:
    print(result)


{'ID': '1', 'Name': 'Iron Pickaxe', 'Image': 'https://static.wikia.nocookie.net/terraria_gamepedia/images/a/a2/Iron_Pickaxe.png/revision/latest?cb=20200516214316', 'Link': 'https://terraria.fandom.com/wiki/Iron_Pickaxe'}
{'ID': '2', 'Name': 'Dirt Block', 'Image': 'https://static.wikia.nocookie.net/terraria_gamepedia/images/5/55/Dirt_Block.png/revision/latest?cb=20200516211400', 'Link': 'https://terraria.fandom.com/wiki/Dirt_Block'}
{'ID': '3', 'Name': 'Stone Block', 'Image': 'https://static.wikia.nocookie.net/terraria_gamepedia/images/3/37/Stone_Block.png/revision/latest?cb=20200516222613', 'Link': 'https://terraria.fandom.com/wiki/Stone_Block'}
{'ID': '4', 'Name': 'Iron Broadsword', 'Image': 'https://static.wikia.nocookie.net/terraria_gamepedia/images/c/cf/Iron_Broadsword.png/revision/latest?cb=20221121015053', 'Link': 'https://terraria.fandom.com/wiki/Iron_Broadsword'}
{'ID': '5', 'Name': 'Mushroom', 'Image': 'https://static.wikia.nocookie.net/terraria_gamepedia/images/8/8c/Mushroom.

### Further Scraping

So far, we have item ID, item Name, static image links, and item page links. Next, we would like the type of each item.

To get this, some extra webscraping is needed.

In [None]:
import time
for result in results:
    url = result['Link']
    if 'None' not in url:
        data = requests.get(url)
        soup = BeautifulSoup(data.text, 'html.parser')
        table = soup.find('table', class_='stat')
        if table:
            tag_span = table.find('span', class_='nowrap tag')
            if tag_span:
                link = tag_span.find('a')
                if link:
                    type = link.text.strip()
                    result['Type'] = type
    else:
        result['Type'] = None
    time.sleep(1)

for result in results[:5]:
    print(result)
    

{'ID': '1', 'Name': 'Iron Pickaxe', 'Image': 'https://static.wikia.nocookie.net/terraria_gamepedia/images/a/a2/Iron_Pickaxe.png/revision/latest?cb=20200516214316', 'Link': 'https://terraria.fandom.com/wiki/Iron_Pickaxe', 'Type': 'Tool'}
{'ID': '2', 'Name': 'Dirt Block', 'Image': 'https://static.wikia.nocookie.net/terraria_gamepedia/images/5/55/Dirt_Block.png/revision/latest?cb=20200516211400', 'Link': 'https://terraria.fandom.com/wiki/Dirt_Block', 'Type': 'Block'}
{'ID': '3', 'Name': 'Stone Block', 'Image': 'https://static.wikia.nocookie.net/terraria_gamepedia/images/3/37/Stone_Block.png/revision/latest?cb=20200516222613', 'Link': 'https://terraria.fandom.com/wiki/Stone_Block', 'Type': 'Block'}
{'ID': '4', 'Name': 'Iron Broadsword', 'Image': 'https://static.wikia.nocookie.net/terraria_gamepedia/images/c/cf/Iron_Broadsword.png/revision/latest?cb=20221121015053', 'Link': 'https://terraria.fandom.com/wiki/Iron_Broadsword', 'Type': 'Weapon'}
{'ID': '5', 'Name': 'Mushroom', 'Image': 'https:

## Conclusion

We now have the basic information needed for an image classification task for terraria inventories.

In [5]:
df = pd.DataFrame(results)
df.to_csv("items.csv")