# Predicting Animal Crossing Item Prices

Animal Crossing is a cute lifestyle game where you create your own island and customize it by crafting and buying different items. 

What factors determine item prices? Is it the type of item? Is it whether the item is craftable? Maybe it's the ability to interact with it? 

I try to find out what qualities an item has that determines its price by creating a correlation heatmap and then I create a KNN and Linear Regression machine learning model using correlated features to find out!

Dataset is from: https://www.kaggle.com/datasets/jessicali9530/animal-crossing-new-horizons-nookplaza-dataset

In [22]:
#Importing all necessary packages
import pandas as pd
import numpy as np
import matplotlib as plt
import seaborn as sns
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error

# Predicting Houseware Prices

First, we read in the data and explore it to figure out what needs to be cleaned.

In [23]:
housewares = pd.read_csv(r'C:\Users\Isabella\Documents\Projects\AC Price Predictor\animal crossing catalog\housewares.csv')
housewares.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3275 entries, 0 to 3274
Data columns (total 32 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Name               3275 non-null   object 
 1   Variation          3074 non-null   object 
 2   Body Title         1182 non-null   object 
 3   Pattern            1508 non-null   object 
 4   Pattern Title      1508 non-null   object 
 5   DIY                3275 non-null   object 
 6   Body Customize     3275 non-null   object 
 7   Pattern Customize  3275 non-null   object 
 8   Kit Cost           2239 non-null   float64
 9   Buy                3275 non-null   object 
 10  Sell               3275 non-null   int64  
 11  Color 1            3275 non-null   object 
 12  Color 2            3275 non-null   object 
 13  Size               3275 non-null   object 
 14  Miles Price        115 non-null    float64
 15  Source             3275 non-null   object 
 16  Source Notes       1870 

In [24]:
housewares.head()

Unnamed: 0,Name,Variation,Body Title,Pattern,Pattern Title,DIY,Body Customize,Pattern Customize,Kit Cost,Buy,...,Interact,Tag,Outdoor,Speaker Type,Lighting Type,Catalog,Filename,Variant ID,Internal ID,Unique Entry ID
0,acoustic guitar,Natural,Body,,,Yes,Yes,No,5.0,NFS,...,Yes,Musical Instrument,No,Does not play music,No lighting,Not for sale,FtrAcorsticguitar_Remake_0_0,0_0,383,EpywQXABBcv2dipsP
1,acoustic guitar,Cherry,Body,,,Yes,Yes,No,5.0,NFS,...,Yes,Musical Instrument,No,Does not play music,No lighting,Not for sale,FtrAcorsticguitar_Remake_1_0,1_0,383,K9she5Y4SuXA8MGBR
2,acoustic guitar,Brown,Body,,,Yes,Yes,No,5.0,NFS,...,Yes,Musical Instrument,No,Does not play music,No lighting,Not for sale,FtrAcorsticguitar_Remake_2_0,2_0,383,vLq9iphAvALBXazDr
3,acoustic guitar,Blue,Body,,,Yes,Yes,No,5.0,NFS,...,Yes,Musical Instrument,No,Does not play music,No lighting,Not for sale,FtrAcorsticguitar_Remake_3_0,3_0,383,nuqeFzNE5PneqGHaj
4,acoustic guitar,White,Body,,,Yes,Yes,No,5.0,NFS,...,Yes,Musical Instrument,No,Does not play music,No lighting,Not for sale,FtrAcorsticguitar_Remake_4_0,4_0,383,DotFsojrhCwrLZ3TF


## Removing unnecessary columns

The following columns are unique ID numbers to each item, are notes for the developers (and thus do not have any bearing on the price of the item), or (in the case of the Color columns) they only state the colors of the object, which is already done in the Variation column. So they will not be considered for this project and will instead be dropped.

- Body
- Unique Entry ID 
- Internal ID 
- Variant ID
- Filename
- HHA Concept 1
- HHA Concept 2 
- HHA Series
- HHA Set
- Version 
- Source Notes
- Color 1
- Color 2
- Kit Cost (not the price we're looking at for this project)

In [25]:
housewares = housewares.drop(['Body Title', 'Color 1', 'Color 2', 'Version', 'Source Notes', 'HHA Concept 1', 'HHA Concept 2', 'HHA Series', 'HHA Set', 'Unique Entry ID', 'Internal ID', 'Variant ID', 'Filename', 'Kit Cost'], axis=1)
print(housewares.columns)

Index(['Name', 'Variation', 'Pattern', 'Pattern Title', 'DIY',
       'Body Customize', 'Pattern Customize', 'Buy', 'Sell', 'Size',
       'Miles Price', 'Source', 'Interact', 'Tag', 'Outdoor', 'Speaker Type',
       'Lighting Type', 'Catalog'],
      dtype='object')


## Cleaning up the different item variations

The code below shows the number of variations each item has. We don't want each of the variations since the price of each item is the same regardless of the variation--instead, we'd like to combine each variation of the item into one and then add the number of variations instead of considering each variation as its own item.

To do this, we'll create a new column called  the `VariationCount` column with the number of variations, drop the `Variation` column, and then remove the duplicate rows. 

In [26]:
housewares['Name'].value_counts()

simple panel            64
changing room           64
loft bed with desk      64
rock guitar             64
electric guitar         56
                        ..
silver mic               1
simple DIY workbench     1
simple well              1
skeleton                 1
sandbox                  1
Name: Name, Length: 532, dtype: int64

In [27]:
housewares['VariationCount'] = housewares['Name'].map(housewares['Name'].value_counts())

In [28]:
housewares.head(8)

Unnamed: 0,Name,Variation,Pattern,Pattern Title,DIY,Body Customize,Pattern Customize,Buy,Sell,Size,Miles Price,Source,Interact,Tag,Outdoor,Speaker Type,Lighting Type,Catalog,VariationCount
0,acoustic guitar,Natural,,,Yes,Yes,No,NFS,3210,1x1,,Crafting,Yes,Musical Instrument,No,Does not play music,No lighting,Not for sale,7
1,acoustic guitar,Cherry,,,Yes,Yes,No,NFS,3210,1x1,,Crafting,Yes,Musical Instrument,No,Does not play music,No lighting,Not for sale,7
2,acoustic guitar,Brown,,,Yes,Yes,No,NFS,3210,1x1,,Crafting,Yes,Musical Instrument,No,Does not play music,No lighting,Not for sale,7
3,acoustic guitar,Blue,,,Yes,Yes,No,NFS,3210,1x1,,Crafting,Yes,Musical Instrument,No,Does not play music,No lighting,Not for sale,7
4,acoustic guitar,White,,,Yes,Yes,No,NFS,3210,1x1,,Crafting,Yes,Musical Instrument,No,Does not play music,No lighting,Not for sale,7
5,acoustic guitar,Black,,,Yes,Yes,No,NFS,3210,1x1,,Crafting,Yes,Musical Instrument,No,Does not play music,No lighting,Not for sale,7
6,acoustic guitar,Pink,,,Yes,Yes,No,NFS,3210,1x1,,Crafting,Yes,Musical Instrument,No,Does not play music,No lighting,Not for sale,7
7,air circulator,White,,,No,No,No,1100,275,1x1,,Nook's Cranny,Yes,Fan,Yes,Does not play music,No lighting,For sale,5


In [29]:
housewares = housewares.drop('Variation', axis=1)
housewares.head()

Unnamed: 0,Name,Pattern,Pattern Title,DIY,Body Customize,Pattern Customize,Buy,Sell,Size,Miles Price,Source,Interact,Tag,Outdoor,Speaker Type,Lighting Type,Catalog,VariationCount
0,acoustic guitar,,,Yes,Yes,No,NFS,3210,1x1,,Crafting,Yes,Musical Instrument,No,Does not play music,No lighting,Not for sale,7
1,acoustic guitar,,,Yes,Yes,No,NFS,3210,1x1,,Crafting,Yes,Musical Instrument,No,Does not play music,No lighting,Not for sale,7
2,acoustic guitar,,,Yes,Yes,No,NFS,3210,1x1,,Crafting,Yes,Musical Instrument,No,Does not play music,No lighting,Not for sale,7
3,acoustic guitar,,,Yes,Yes,No,NFS,3210,1x1,,Crafting,Yes,Musical Instrument,No,Does not play music,No lighting,Not for sale,7
4,acoustic guitar,,,Yes,Yes,No,NFS,3210,1x1,,Crafting,Yes,Musical Instrument,No,Does not play music,No lighting,Not for sale,7


In [30]:
housewares = housewares.drop_duplicates().reset_index()
housewares = housewares.drop('index', axis = 1)
housewares.head()

Unnamed: 0,Name,Pattern,Pattern Title,DIY,Body Customize,Pattern Customize,Buy,Sell,Size,Miles Price,Source,Interact,Tag,Outdoor,Speaker Type,Lighting Type,Catalog,VariationCount
0,acoustic guitar,,,Yes,Yes,No,NFS,3210,1x1,,Crafting,Yes,Musical Instrument,No,Does not play music,No lighting,Not for sale,7
1,air circulator,,,No,No,No,1100,275,1x1,,Nook's Cranny,Yes,Fan,Yes,Does not play music,No lighting,For sale,5
2,alto saxophone,,,No,No,No,3400,850,1x1,,Nook's Cranny,Yes,Musical Instrument,No,Does not play music,No lighting,For sale,1
3,anatomical model,,,No,No,No,3500,875,1x1,,Nook's Cranny,No,Hospital,No,Does not play music,No lighting,For sale,1
4,anchor statue,,,No,Yes,No,NFS,1400,1x1,,Fishing Tourney,No,Seaside,Yes,Does not play music,No lighting,Not for sale,6


##  Cleaning up columns

Now that the columns have been decided on, it's time to clean up columns with NaN values and separate the dataset into items that cost Miles and items that cost 