# University of Stirling

# ITNPBD2 Representing and Manipulating Data

# Assignment Autumn 2025

# A Consultancy Job for JC Penney

This notebook forms the assignment instructions and submission document of the assignment for ITNPBD2. Read the instructions carefully and enter code into the cells as indicated.

You will need these five files, which were in the Zip file you downloaded from the course webpage:

- jcpenney_reviewers.json
- jcpenney_products.json
- products.csv
- reviews.csv
- users.csv

The data in these files describes products that have been sold by the American retail giant, JC Penney, and reviews by customers who bought them. Note that the product data is real, but the customer data is synthetic.

Your job is to process the data, as requested in the instructions in the markdown cells in this notebook.

# Completing the Assignment

Rename this file to be xxxxxx_BD2 where xxxxxx is your student number, then type your code and narrative description into the boxes provided. Add as many code and markdown cells as you need. The cells should contain:

- **Text narrative describing what you did with the data**
- **The code that performs the task you have described**
- **Comments that explain your code**

The final structure (in PDF) of your report must:
- **Start from the main insights observed (max 5 pages)**
- **Include as an appendix the source code used for producing those insights (max 15 pages)**
- **Include an AI cover sheet (provided on Canvas), which must contain a link to a versioned notebook file in OneDrive or another platform for version checks.**

# Marking Scheme
The assessment will be marked against the university Common Marking Scheme (CMS)

Here is a summary of what you need to achieve to gain a grade in the major grade bands:

|Grade|Requirement|
|:---|:---|
| Fail | You will fail if your code does not run or does not achieve even the basics of the task. You may also fail if you submit code without either comments or a text explanation of what the code does.|
| Pass | To pass, you must submit sufficient working code to show that you have mastered the basics of the task, even if not everything works completely. You must include some justifications for your choice of methods, but without mentioning alternatives. |
| Merit | For a merit, your code must be mostly correct, with only small problems or parts missing, and your comments must be useful rather than simply re-stating the code in English. Most choices for methods and structures should be explained and alternatives mentioned. |
| Distinction | For a distinction, your code must be working, correct, and well commented and shows an appreciation of style, efficiency and reliability. All choices for methods and structures are concisely justified and alternatives are given well thought considerations. For a distinction, your work should be good enough to present to executives at the company.|

The full details of the CMS can be found here

https://www.stir.ac.uk/about/professional-services/student-academic-and-corporate-services/academic-registry/academic-policy-and-practice/quality-handbook/assessment-policy-and-procedure/appendix-2-postgraduate-common-marking-scheme/

Note that this means there are not certain numbers of marks allocated to each stage of the assignment. Your grade will reflect how well your solutions and comments demonstrate that you have achieved the learning outcomes of the task. 

## Submission
When you are ready to submit, **print** your notebook as PDF (go to File -> Print Preview) in the Jupyter menu. Make sure you have run all the cells and that their output is displayed. Any lines of code or comments that are not visible in the pdf should be broken across several lines. You can then submit the file online.

Late penalties will apply at a rate of three marks per day, up to a maximum of 7 days. After 7 days you will be given a mark of 0. Extensions will be considered under acceptable circumstances outside your control.

## Academic Integrity

This is an individual assignment, and so all submitted work must be fully your own work.

The University of Stirling is committed to protecting the quality and standards of its awards. Consequently, the University seeks to promote and nurture academic integrity, support staff academic integrity, and support students to understand and develop good academic skills that facilitate academic integrity.

In addition, the University deals decisively with all forms of Academic Misconduct.

Where a student does not act with academic integrity, their work or behaviour may demonstrate Poor Academic Practice or it may represent Academic Misconduct.

### Poor Academic Practice

Poor Academic Practice is defined as: "The submission of any type of assessment with a lack of referencing or inadequate referencing which does not effectively acknowledge the origin of words, ideas, images, tables, diagrams, maps, code, sound and any other sources used in the assessment."

### Academic Misconduct

Academic Misconduct is defined as: "any act or attempted act that does not demonstrate academic integrity and that may result in creating an unfair academic advantage for you or another person, or an academic disadvantage for any other member or member of the academic community."

Plagiarism is presenting somebody else’s work as your own **and includes the use of artificial intelligence tools beyond AIAS Level 2 or the use of Large Language Models.**. Plagiarism is a form of academic misconduct and is taken very seriously by the University. Students found to have plagiarised work can have marks deducted and, in serious cases, even be expelled from the University. Do not submit any work that is not entirely your own. Do not collaborate with or get help from anybody else with this assignment.

The University of Stirling's full policy on Academic Integrity can be found at:

https://www.stir.ac.uk/about/professional-services/student-academic-and-corporate-services/academic-registry/academic-policy-and-practice/quality-handbook/academic-integrity-policy-and-academic-misconduct-procedure/

## The Assignment
Your task with this assignment is to use the data provided to demonstrate your Python data manipulation skills.

There are three `.csv` files and two `.json` files so you can process different types of data. The files also contain unstructured data in the form of natural language in English and links to images that you can access from the JC Penney website (use the field called `product_image_urls`).

Start with easy tasks to show you can read in a file, create some variables and data structures, and manipulate their contents. Then move onto something more interesting.

Look at the data that we provided with this assessment and think of something interesting to do with it using whatever libraries you like. Describe what you decide to do with the data and why it might be interesting or useful to the company to do it.

You can add additional data if you need to - either download it or access it using `requests`. Produce working code to implement your ideas in as many cells as you need below. There is no single right answer, the aim is to simply show you are competent in using python for data analysis. Exactly how you do that is up to you.

For a distinction class grade, this must show originality, creative thinking, and insights beyond what you've been taught directly on the module.

## Structure
You may structure the appendix of the project how you wish, but here is a suggested guideline to help you organise your work, based on the CRISP-DM data science methodology:

 1. **Business understanding** - What business context is the data coming from? What insights would be valuable in that context, and what data would be required for that purporse? 
 2. **Data understanding and preparation** - Explore the data and show you understand its structure and relations, with the aid of appropriate visualisation techniques. Assess the data quality, which insights you would be able to answer from it, and what preparation the data would require. Add new data from another source if required to bring new insights to the data you already have.
 3. **Data modeling (optional)** - Would modeling be required for the insights you have considered? Use appropriate techniques, if so.
 4. **Evaluation and deployment** - How do the insights you obtained help the company, and how can should they be adopted in their business? If modeling techniques have been adopted, are their use scientifically sound and how should they be mantained?

# Remember to make sure you are working completely on your own.
# Don't work in a group or with a friend


## Introduction
briefly describe the data sets - purpose of the analysis

## Data Preparation
explain how I cleaned the data, the reasoning behind what i did, can include a short summary of the finalised data

## Analysis/Insights/Visualisation
What did i find out about the data set - include the graphs - how do these insights help the company - how can these insights be adopted into the business? - ideas- no of reviews by states: which states gives the most reviews? - highest rated products? - relationship between rating and price? - highest rated brands? - most reviewed products? - price vs rating, any correlation? 

## Conclusion
Summarise everything here and what it means for JC Penny


## Appendix

In [1]:
import pandas as pd  #Importing necessary libraries
import numpy as np
import json

def load_json_lines(filepath):
    data = [] #creates an empty list to store the JSON objects
    with open(filepath, "r") as f:  #Opens the file for reading
        for line in f:
            try:
                data.append(json.loads(line))   #Adds each JSON object to the list
            except json.JSONDecodeError:
                continue                #Will skip any lines not valid
    return pd.json_normalize(data)   #Converts the list into a Pandas dataframe

#Loading all the files
products = pd.read_csv("products[1].csv")
reviews = pd.read_csv("reviews[1].csv")
users = pd.read_csv("users[1].csv")
jc_products = load_json_lines("jcpenney_products[1].json")
jc_reviewers = load_json_lines("jcpenney_reviewers[1].json")

In [2]:
products.head(), products.info()  #To see an overview of the dataframe, column names/data types/non-null counts

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7982 entries, 0 to 7981
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Uniq_id      7982 non-null   object 
 1   SKU          7915 non-null   object 
 2   Name         7982 non-null   object 
 3   Description  7439 non-null   object 
 4   Price        5816 non-null   float64
 5   Av_Score     7982 non-null   float64
dtypes: float64(2), object(4)
memory usage: 374.3+ KB


(                            Uniq_id           SKU  \
 0  b6c0b6bea69c722939585baeac73c13d  pp5006380337   
 1  93e5272c51d8cce02597e3ce67b7ad0a  pp5006380337   
 2  013e320f2f2ec0cf5b3ff5418d688528  pp5006380337   
 3  505e6633d81f2cb7400c0cfa0394c427  pp5006380337   
 4  d969a8542122e1331e304b09f81a83f6  pp5006380337   
 
                                           Name  \
 0  Alfred Dunner® Essential Pull On Capri Pant   
 1  Alfred Dunner® Essential Pull On Capri Pant   
 2  Alfred Dunner® Essential Pull On Capri Pant   
 3  Alfred Dunner® Essential Pull On Capri Pant   
 4  Alfred Dunner® Essential Pull On Capri Pant   
 
                                          Description  Price  Av_Score  
 0  Youll return to our Alfred Dunner pull-on capr...  41.09     2.625  
 1  Youll return to our Alfred Dunner pull-on capr...  41.09     3.000  
 2  Youll return to our Alfred Dunner pull-on capr...  41.09     2.625  
 3  Youll return to our Alfred Dunner pull-on capr...  41.09     3.500  
 

In [3]:
products.columns = (    #Begin to clean the column names
    products.columns
    .str.lower()       #Put it all into lower case
    .str.strip()       #Remove spaces
    .str.replace(" ", "_")  #To use underscores instead
)

print(products.columns)

Index(['uniq_id', 'sku', 'name', 'description', 'price', 'av_score'], dtype='object')


In [4]:
products.isna().sum() #To see if there is any missing values

uniq_id           0
sku              67
name              0
description     543
price          2166
av_score          0
dtype: int64

In [5]:
#Id/name/score all have 7982 non-null count, so I aim to make description/price/sku the same
products["description"] = products["description"].fillna("No description available") #Replacing empty values in description column with "No description available"

products["price"] = products["price"].fillna(products["price"].median())  #Fills any missing prices with a median price to help balance the dataset instead of completely removing those rows

    #

In [6]:
products.isna().sum() 

uniq_id         0
sku            67
name            0
description     0
price           0
av_score        0
dtype: int64

In [7]:
products["sku"] = products["sku"].fillna("unknown") # Replace missing SKU values with "unknown" to keep the row but mark it clearly.
products.isna().sum() 



uniq_id        0
sku            0
name           0
description    0
price          0
av_score       0
dtype: int64

In [8]:
products.head(), products.info() #Review the dataset again to see if the changes have been made

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7982 entries, 0 to 7981
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   uniq_id      7982 non-null   object 
 1   sku          7982 non-null   object 
 2   name         7982 non-null   object 
 3   description  7982 non-null   object 
 4   price        7982 non-null   float64
 5   av_score     7982 non-null   float64
dtypes: float64(2), object(4)
memory usage: 374.3+ KB


(                            uniq_id           sku  \
 0  b6c0b6bea69c722939585baeac73c13d  pp5006380337   
 1  93e5272c51d8cce02597e3ce67b7ad0a  pp5006380337   
 2  013e320f2f2ec0cf5b3ff5418d688528  pp5006380337   
 3  505e6633d81f2cb7400c0cfa0394c427  pp5006380337   
 4  d969a8542122e1331e304b09f81a83f6  pp5006380337   
 
                                           name  \
 0  Alfred Dunner® Essential Pull On Capri Pant   
 1  Alfred Dunner® Essential Pull On Capri Pant   
 2  Alfred Dunner® Essential Pull On Capri Pant   
 3  Alfred Dunner® Essential Pull On Capri Pant   
 4  Alfred Dunner® Essential Pull On Capri Pant   
 
                                          description  price  av_score  
 0  Youll return to our Alfred Dunner pull-on capr...  41.09     2.625  
 1  Youll return to our Alfred Dunner pull-on capr...  41.09     3.000  
 2  Youll return to our Alfred Dunner pull-on capr...  41.09     2.625  
 3  Youll return to our Alfred Dunner pull-on capr...  41.09     3.500  
 

In [9]:
reviews.head(), reviews.info() #Now looking at the reviews file

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 39063 entries, 0 to 39062
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Uniq_id   39063 non-null  object
 1   Username  39063 non-null  object
 2   Score     39063 non-null  int64 
 3   Review    39063 non-null  object
dtypes: int64(1), object(3)
memory usage: 1.2+ MB


(                            Uniq_id  Username  Score  \
 0  b6c0b6bea69c722939585baeac73c13d  fsdv4141      2   
 1  b6c0b6bea69c722939585baeac73c13d  krpz1113      1   
 2  b6c0b6bea69c722939585baeac73c13d  mbmg3241      2   
 3  b6c0b6bea69c722939585baeac73c13d  zeqg1222      0   
 4  b6c0b6bea69c722939585baeac73c13d  nvfn3212      3   
 
                                               Review  
 0  You never have to worry about the fit...Alfred...  
 1  Good quality fabric. Perfect fit. Washed very ...  
 2  I do not normally wear pants or capris that ha...  
 3  I love these capris! They fit true to size and...  
 4  This product is very comfortable and the fabri...  ,
 None)

In [10]:
reviews.columns = (  #To keep column names consistant with every data set, I will be making them all lower case, removing spaces and adding "_" instead
    reviews.columns
    .str.lower()
    .str.strip()
    .str.replace(" ", "_")
)

In [12]:
products["uniq_id"].nunique(), reviews["uniq_id"].nunique()  #I wanted to make sure there is a unique number of IDs in both files to make sure they match

(7982, 7982)

In [13]:
users.head(), users.info()  #Looking at users file summary

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Username  5000 non-null   object
 1   DOB       5000 non-null   object
 2   State     5000 non-null   object
dtypes: object(3)
memory usage: 117.3+ KB


(   Username         DOB          State
 0  bkpn1412  31.07.1983         Oregon
 1  gqjs4414  27.07.1998  Massachusetts
 2  eehe1434  08.08.1950          Idaho
 3  hkxj1334  03.08.1969        Florida
 4  jjbd1412  26.07.2001        Georgia,
 None)

In [14]:
users.columns = (  #Keeping consistancy in column names
    users.columns
    .str.lower()
    .str.strip()
    .str.replace(" ", "_")
)

In [17]:
users.head(), users.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   username  5000 non-null   object
 1   dob       5000 non-null   object
 2   state     5000 non-null   object
dtypes: object(3)
memory usage: 117.3+ KB


(   username         dob          state
 0  bkpn1412  31.07.1983         Oregon
 1  gqjs4414  27.07.1998  Massachusetts
 2  eehe1434  08.08.1950          Idaho
 3  hkxj1334  03.08.1969        Florida
 4  jjbd1412  26.07.2001        Georgia,
 None)

In [20]:
users["dob"] = pd.to_datetime(users["dob"], format="%d.%m.%Y", errors="coerce")  #Converts the date of birth from strings to datetime objects for easier analysis

users["dob"].head()

0   1983-07-31
1   1998-07-27
2   1950-08-08
3   1969-08-03
4   2001-07-26
Name: dob, dtype: datetime64[ns]

In [22]:
users["state"].unique(), users["state"].nunique()  #Making sure there is no duplicate states just incase some may have been repeated by beginning in lower case

(array(['Oregon', 'Massachusetts', 'Idaho', 'Florida', 'Georgia',
        'Montana', 'Pennsylvania', 'Connecticut', 'Arkansas', 'Nebraska',
        'California', 'New Hampshire', 'District of Columbia',
        'Washington', 'Minnesota', 'New Mexico', 'Virginia', 'Kansas',
        'Illinois', 'North Dakota', 'Colorado', 'New York',
        'Minor Outlying Islands', 'Northern Mariana Islands',
        'West Virginia', 'Texas', 'South Dakota', 'Maryland', 'Maine',
        'Ohio', 'Rhode Island', 'Michigan', 'Alaska', 'Iowa', 'Oklahoma',
        'Mississippi', 'South Carolina', 'Missouri', 'New Jersey',
        'Tennessee', 'North Carolina', 'Guam', 'Wyoming', 'Delaware',
        'Vermont', 'Indiana', 'Louisiana', 'Wisconsin', 'Hawaii',
        'Puerto Rico', 'Alabama', 'Kentucky', 'Arizona', 'Nevada', 'Utah',
        'American Samoa', 'U.S. Virgin Islands'], dtype=object),
 57)

In [23]:
jc_products.head(), jc_products.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7982 entries, 0 to 7981
Data columns (total 15 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   uniq_id                 7982 non-null   object 
 1   sku                     7982 non-null   object 
 2   name_title              7982 non-null   object 
 3   description             7982 non-null   object 
 4   list_price              7982 non-null   object 
 5   sale_price              7982 non-null   object 
 6   category                7982 non-null   object 
 7   category_tree           7982 non-null   object 
 8   average_product_rating  7982 non-null   float64
 9   product_url             7982 non-null   object 
 10  product_image_urls      7982 non-null   object 
 11  brand                   7982 non-null   object 
 12  total_number_reviews    7982 non-null   int64  
 13  Reviews                 7982 non-null   object 
 14  Bought With             7982 non-null   

(                            uniq_id           sku  \
 0  b6c0b6bea69c722939585baeac73c13d  pp5006380337   
 1  93e5272c51d8cce02597e3ce67b7ad0a  pp5006380337   
 2  013e320f2f2ec0cf5b3ff5418d688528  pp5006380337   
 3  505e6633d81f2cb7400c0cfa0394c427  pp5006380337   
 4  d969a8542122e1331e304b09f81a83f6  pp5006380337   
 
                                     name_title  \
 0  Alfred Dunner® Essential Pull On Capri Pant   
 1  Alfred Dunner® Essential Pull On Capri Pant   
 2  Alfred Dunner® Essential Pull On Capri Pant   
 3  Alfred Dunner® Essential Pull On Capri Pant   
 4  Alfred Dunner® Essential Pull On Capri Pant   
 
                                          description list_price sale_price  \
 0  You'll return to our Alfred Dunner pull-on cap...      41.09      24.16   
 1  You'll return to our Alfred Dunner pull-on cap...      41.09      24.16   
 2  You'll return to our Alfred Dunner pull-on cap...      41.09      24.16   
 3  You'll return to our Alfred Dunner pull-on cap

In [26]:
jc_products.columns = (  #Noticed some capital letters again in column names so doing this for consistancy with names
    jc_products.columns
    .str.lower()
    .str.strip()
    .str.replace(" ", "_")
)

jc_products.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7982 entries, 0 to 7981
Data columns (total 15 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   uniq_id                 7982 non-null   object 
 1   sku                     7982 non-null   object 
 2   name_title              7982 non-null   object 
 3   description             7982 non-null   object 
 4   list_price              7982 non-null   object 
 5   sale_price              7982 non-null   object 
 6   category                7982 non-null   object 
 7   category_tree           7982 non-null   object 
 8   average_product_rating  7982 non-null   float64
 9   product_url             7982 non-null   object 
 10  product_image_urls      7982 non-null   object 
 11  brand                   7982 non-null   object 
 12  total_number_reviews    7982 non-null   int64  
 13  reviews                 7982 non-null   object 
 14  bought_with             7982 non-null   

In [27]:
jc_products.isna().sum() #Noticed all non-null counts correct but double checking for missing values

uniq_id                   0
sku                       0
name_title                0
description               0
list_price                0
sale_price                0
category                  0
category_tree             0
average_product_rating    0
product_url               0
product_image_urls        0
brand                     0
total_number_reviews      0
reviews                   0
bought_with               0
dtype: int64

In [28]:
jc_reviewers.head(),jc_reviewers.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Username  5000 non-null   object
 1   DOB       5000 non-null   object
 2   State     5000 non-null   object
 3   Reviewed  5000 non-null   object
dtypes: object(4)
memory usage: 156.4+ KB


(   Username         DOB          State  \
 0  bkpn1412  31.07.1983         Oregon   
 1  gqjs4414  27.07.1998  Massachusetts   
 2  eehe1434  08.08.1950          Idaho   
 3  hkxj1334  03.08.1969        Florida   
 4  jjbd1412  26.07.2001        Georgia   
 
                                             Reviewed  
 0                 [cea76118f6a9110a893de2b7654319c0]  
 1                 [fa04fe6c0dd5189f54fe600838da43d3]  
 2                                                 []  
 3  [f129b1803f447c2b1ce43508fb822810, 3b0c9bc0be6...  
 4                                                 []  ,
 None)

In [30]:
jc_reviewers.columns = (  #Consistancy
    jc_reviewers.columns
    .str.lower()
    .str.strip()
    .str.replace(" ", "_")
)

jc_reviewers.info(), jc_reviewers.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   username  5000 non-null   object
 1   dob       5000 non-null   object
 2   state     5000 non-null   object
 3   reviewed  5000 non-null   object
dtypes: object(4)
memory usage: 156.4+ KB


(None,
    username         dob          state  \
 0  bkpn1412  31.07.1983         Oregon   
 1  gqjs4414  27.07.1998  Massachusetts   
 2  eehe1434  08.08.1950          Idaho   
 3  hkxj1334  03.08.1969        Florida   
 4  jjbd1412  26.07.2001        Georgia   
 
                                             reviewed  
 0                 [cea76118f6a9110a893de2b7654319c0]  
 1                 [fa04fe6c0dd5189f54fe600838da43d3]  
 2                                                 []  
 3  [f129b1803f447c2b1ce43508fb822810, 3b0c9bc0be6...  
 4                                                 []  )

In [31]:
jc_reviewers["dob"] = pd.to_datetime(jc_reviewers["dob"], format="%d.%m.%Y", errors="coerce")  #Converting to datetime objects again 

jc_reviewers.head()

Unnamed: 0,username,dob,state,reviewed
0,bkpn1412,1983-07-31,Oregon,[cea76118f6a9110a893de2b7654319c0]
1,gqjs4414,1998-07-27,Massachusetts,[fa04fe6c0dd5189f54fe600838da43d3]
2,eehe1434,1950-08-08,Idaho,[]
3,hkxj1334,1969-08-03,Florida,"[f129b1803f447c2b1ce43508fb822810, 3b0c9bc0be6..."
4,jjbd1412,2001-07-26,Georgia,[]


In [33]:
jc_reviewers.isna().sum() #Checking for missing values

username    0
dob         0
state       0
reviewed    0
dtype: int64

In [34]:
jc_reviewers["state"].unique(), jc_reviewers["state"].nunique()  #Making sure the states match up with the reviews file

(array(['Oregon', 'Massachusetts', 'Idaho', 'Florida', 'Georgia',
        'Montana', 'Pennsylvania', 'Connecticut', 'Arkansas', 'Nebraska',
        'California', 'New Hampshire', 'District of Columbia',
        'Washington', 'Minnesota', 'New Mexico', 'Virginia', 'Kansas',
        'Illinois', 'North Dakota', 'Colorado', 'New York',
        'Minor Outlying Islands', 'Northern Mariana Islands',
        'West Virginia', 'Texas', 'South Dakota', 'Maryland', 'Maine',
        'Ohio', 'Rhode Island', 'Michigan', 'Alaska', 'Iowa', 'Oklahoma',
        'Mississippi', 'South Carolina', 'Missouri', 'New Jersey',
        'Tennessee', 'North Carolina', 'Guam', 'Wyoming', 'Delaware',
        'Vermont', 'Indiana', 'Louisiana', 'Wisconsin', 'Hawaii',
        'Puerto Rico', 'Alabama', 'Kentucky', 'Arizona', 'Nevada', 'Utah',
        'American Samoa', 'U.S. Virgin Islands'], dtype=object),
 57)