# NAICS
North American Industry Classification Codes System

David Tersegno

4/15/22

DSIR222 gorup projectttttttttttt


Looking at [this code system](https://www2.census.gov/programs-surveys/cbp/technical-documentation/reference/naics-descriptions/naics2017.txt) for relevant industries for our project. We'll see how many warehouses, storage, logistics, shipping, etc. are in the associated data later. 
the file had to be resaved as utf8. It originally came as ANSI, which jupyter doesn't like.

The data we hope to apply this to:
[Data here](https://www.census.gov/programs-surveys/cbp/data/datasets.html),
[FTP server for entire census data archive here](https://www2.census.gov/)



In [71]:
import pandas as pd
import numpy as np

In [72]:
code_file_path = '../raw_data/naics2017_UTF8.txt'
naics = pd.read_csv(code_file_path)

In [73]:
naics

Unnamed: 0,NAICS,DESCRIPTION
0,------,Total for all sectors
1,11----,"Agriculture, Forestry, Fishing and Hunting"
2,113///,Forestry and Logging
3,1131//,Timber Tract Operations
4,11311/,Timber Tract Operations
...,...,...
1998,81394/,Political Organizations
1999,813940,Political Organizations
2000,81399/,"Other Similar Organizations (except Business, ..."
2001,813990,"Other Similar Organizations (except Business, ..."


[This report](https://www.epipeline.com/mktng/nl-articles/general-warehousing-and-storage-2015.html) has a short list of relevant codes. It's focused on 493110: General Warehousing and Storage. It also refers to

> Cross References:Renting or leasing space for self storage--are classified in Industry 531130, Lessors of Miniwarehouses and Self-Storage Units; and

>Selling in combination with handling and/or distributing goods to other wholesale or retail establishments--are classified in Sector 42, Wholesale Trade.

In [74]:
naics[naics['NAICS'] == '493110']

Unnamed: 0,NAICS,DESCRIPTION
1282,493110,General Warehousing and Storage


In [75]:
#cool. keep track of its index 
naics_list =[naics[naics['NAICS'] == '493110'].index[0]]
naics_list

[1282]

In [76]:
naics[naics['NAICS'] == '531130']

Unnamed: 0,NAICS,DESCRIPTION
1451,531130,Lessors of Miniwarehouses and Self-Storage Units


In [77]:
# Not sure if this is for us, but for now, keep track of it
naics_list.append(1451)
naics_list

[1282, 1451]

In [78]:
#make a copy of the original, because I'm gonna start cutting through this for anything else of relevance.
naics_orig = naics.copy()

In [79]:
#removes the top 20 entries in the dataframe and prints out the top 20 of the result.
def chopper(dataframe, number_to_cut = 20):
    this_index_list = dataframe.index
    chop_these_indices = this_index_list[:number_to_cut]
    dataframe.drop(chop_these_indices, inplace = True)
    return dataframe.head(number_to_cut)
    

In [81]:
naics.head(20)

Unnamed: 0,NAICS,DESCRIPTION
0,------,Total for all sectors
1,11----,"Agriculture, Forestry, Fishing and Hunting"
2,113///,Forestry and Logging
3,1131//,Timber Tract Operations
4,11311/,Timber Tract Operations
5,113110,Timber Tract Operations
6,1132//,Forest Nurseries and Gathering of Forest Products
7,11321/,Forest Nurseries and Gathering of Forest Products
8,113210,Forest Nurseries and Gathering of Forest Products
9,1133//,Logging


In [82]:
chopper(naics)

Unnamed: 0,NAICS,DESCRIPTION
20,114210,Hunting and Trapping
21,115///,Support Activities for Agriculture and Forestry
22,1151//,Support Activities for Crop Production
23,11511/,Support Activities for Crop Production
24,115111,Cotton Ginning
25,115112,"Soil Preparation, Planting, and Cultivating"
26,115113,"Crop Harvesting, Primarily by Machine"
27,115114,Postharvest Crop Activities (except Cotton Gin...
28,115115,Farm Labor Contractors and Crew Leaders
29,115116,Farm Management Services


In [83]:
chopper(naics)

Unnamed: 0,NAICS,DESCRIPTION
40,211120,Crude Petroleum Extraction
41,21113/,Natural Gas Extraction
42,211130,Natural Gas Extraction
43,212///,Mining (except Oil and Gas)
44,2121//,Coal Mining
45,21211/,Coal Mining
46,212111,Bituminous Coal and Lignite Surface Mining
47,212112,Bituminous Coal Underground Mining
48,212113,Anthracite Mining
49,2122//,Metal Ore Mining


In [84]:
chopper(naics)

Unnamed: 0,NAICS,DESCRIPTION
60,2123//,Nonmetallic Mineral Mining and Quarrying
61,21231/,Stone Mining and Quarrying
62,212311,Dimension Stone Mining and Quarrying
63,212312,Crushed and Broken Limestone Mining and Quarrying
64,212313,Crushed and Broken Granite Mining and Quarrying
65,212319,Other Crushed and Broken Stone Mining and Quar...
66,21232/,"Sand, Gravel, Clay, and Ceramic and Refractory..."
67,212321,Construction Sand and Gravel Mining
68,212322,Industrial Sand Mining
69,212324,Kaolin and Ball Clay Mining


In [85]:
chopper(naics)

Unnamed: 0,NAICS,DESCRIPTION
80,213112,Support Activities for Oil and Gas Operations
81,213113,Support Activities for Coal Mining
82,213114,Support Activities for Metal Mining
83,213115,Support Activities for Nonmetallic Minerals (e...
84,22----,Utilities
85,221///,Utilities
86,2211//,"Electric Power Generation, Transmission and Di..."
87,22111/,Electric Power Generation
88,221111,Hydroelectric Power Generation
89,221112,Fossil Fuel Electric Power Generation


In [87]:
#I might be interested in these electric power guys at another time.
#It looks like anything starting with 22 is a utility.
#this prefix thing might make the process of finding warehouses easier. 

#Wait. 

# these things are already organized for me. I should be able to just grab anything with a similar code.

In [95]:
naics_orig['starts_with_49'] = naics_orig['NAICS'].apply(lambda naic: naic[:1] == '49')

In [97]:
naics_orig['starts_with_49'].sum()

0