-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Install and setup environment #1
Comments
no one |
Wow, "no one" is also my favorite! The dataset we'll be using is the compilation of stats and traits for the Pokémon video games. Pokémon is a popular game for generations of Nintendo handheld video game consoles where players collect and train animal-like creatures called Pokémon. We'll be creating a model to try to predict whether a Pokémon is a legendary Pokémon, a rare type of Pokémon who's the only one of its species. There are a lot of existing compilations of Pokémon stats, but we'll be using a .CSV version found on Kaggle. There's a download button on the website, so save the file to your computer and we can begin. First, we need to read in the CSV file. We'll be doing so using Pandas:
First, let's see what the categories of data are. This was also available on the Kaggle page, but that won't be the case for most real-world data:
Okay so we have a lot of types of data here! Some of these descriptions might be confusing to those who aren't very familiar with the games. That's okay, we'll narrow our focus a little and only select categories we think will be relevant. It's always nice to have more data to train the model with, but it also takes time to clean and prepare that data. We'll be keeping it simple here:
Which library did we use to read our CSV file? Leave a comment with your answer to continue |
pandas |
Correct! We used "pandas". 🐼 Now that we can see all of our data, we'll need to format it to be read by the model. First, we need to make sure all the data is numerical. A lot of our data already is, such as stats like 'HP' and 'Attack'. Great! A few of the categories aren't numerical however. One example is the category that we'll be training our model to detect: the "isLegendary" column of data. These are the labels that we will eventually separate from the rest of the data and use as an answer key for the model's training. We'll convert this column from boolean "False" and "True" statements to the equivalent "0" and "1" integers:
There are a few other categories that we'll need to convert as well. Let's look at "Type_1" as an example. Pokémon have associated elements, such as water and fire. Our first intuition at converting these to numbers could be to just assign a number to each category, such as: Water = 1, Fire = 2, Grass = 3 and so on. This isn't a good idea because these numerical assignments aren't ordinal; they don't lie on a scale. By doing this, we would be implying that Water is closer to Fire than it is Grass, which doesn't really make sense. The solution to this is to create dummy variables. By doing this we'll be creating a new column for each possible variable. There will be a column called "Water" that would be a 1 if it was a water Pokémon, and a 0 if it wasn't. Then there will be another column called "Fire" that would be a 1 if it was a fire Pokémon, and so forth for the rest of the types. This prevents us from implying any pattern or direction among the types. Let's do that:
This function first uses
Why do we need to create dummy variables? A. Some categories are not numerical. Leave a comment with your answer to continue |
D |
Great! Now that our data is in a workable form, we can start processing it for machine learning. I've opened a new issue for you with the next steps. |
Welcome to the world of machine learning with TensorFlow! Working with TensorFlow can seem intimidating at first, but this tutorial will start with the basics to ensure you have a strong foundation with the package. This tutorial will be focusing on classifying and predicting Pokémon, but the elements discussed within it can certainly be helpful when using TensorFlow for other ideas, as well. Without further ado, let's begin!
First, let's download TensorFlow through
pip
. While you can install the version of TensorFlow that uses your GPU, we'll be using the CPU-driven TensorFlow. Type this into your terminal:pip install tensorflow
Now that it's installed, we can truly begin. Let's import Tensorflow, and a few other packages we'll need. All of this course involve using the command line interface. Enter these commands to import and the necessary packages:
Leave a comment with your favorite Pokémon (such as Pikachu) to continue.
The text was updated successfully, but these errors were encountered: