# Identifying High-Skill, High-Demand Jobs for Career Counseling
### Step 1: Explore O*NET Data

This notebook will:
1. Download the required O*NET data files.
2. Load and examine the data structure.
3. Explore job-related skills, knowledge, and work activities.


In [None]:
# Importing necessary libraries
import pandas as pd

In [None]:

# Define file paths for the downloaded O*NET files
DATA_PATH = "https://www.onetcenter.org/dl_files/database/db_29_1_text/"

# Load the Skills.txt file
skills_data = pd.read_csv(DATA_PATH + "Skills.txt", sep="\t", encoding='latin1')  # Tab-separated file

# Display the first few rows of the skills data
print("Skills Data Overview:")
skills_data.head()

In [9]:
# Load the Knowledge.txt file
knowledge_data = pd.read_csv(DATA_PATH + "Knowledge.txt", sep="\t", encoding='latin1')

# Display the first few rows of the knowledge data
print("Knowledge Data Overview:")
knowledge_data.head()

Knowledge Data Overview:


Unnamed: 0,O*NET-SOC Code,Element ID,Element Name,Scale ID,Data Value,N,Standard Error,Lower CI Bound,Upper CI Bound,Recommend Suppress,Not Relevant,Date,Domain Source
0,11-1011.00,2.C.1.a,Administration and Management,IM,4.78,28.0,0.1102,4.5564,5.0,N,,08/2023,Incumbent
1,11-1011.00,2.C.1.a,Administration and Management,LV,6.5,28.0,0.213,6.0666,6.9409,N,N,08/2023,Incumbent
2,11-1011.00,2.C.1.b,Administrative,IM,2.42,28.0,0.4651,1.4662,3.3749,N,,08/2023,Incumbent
3,11-1011.00,2.C.1.b,Administrative,LV,2.69,28.0,0.8678,0.9078,4.469,N,N,08/2023,Incumbent
4,11-1011.00,2.C.1.c,Economics and Accounting,IM,4.04,28.0,0.348,3.3246,4.7526,N,,08/2023,Incumbent


In [11]:
# Load the Work Context.txt file
work_context_data = pd.read_csv(DATA_PATH + "Work%20Context.txt", sep="\t", encoding='latin1')

# Display the first few rows of the work context data
print("Work Context Data Overview:")
work_context_data.head()

Work Context Data Overview:


Unnamed: 0,O*NET-SOC Code,Element ID,Element Name,Scale ID,Category,Data Value,N,Standard Error,Lower CI Bound,Upper CI Bound,Recommend Suppress,Not Relevant,Date,Domain Source
0,11-1011.00,4.C.1.a.2.c,Public Speaking,CX,,3.07,37.0,0.2851,2.4923,3.6486,N,,08/2023,Incumbent
1,11-1011.00,4.C.1.a.2.c,Public Speaking,CXP,1.0,0.13,37.0,0.137,0.016,1.077,N,,08/2023,Incumbent
2,11-1011.00,4.C.1.a.2.c,Public Speaking,CXP,2.0,39.49,37.0,11.0101,20.4073,62.4299,N,,08/2023,Incumbent
3,11-1011.00,4.C.1.a.2.c,Public Speaking,CXP,3.0,33.07,37.0,7.1359,20.4456,48.7245,N,,08/2023,Incumbent
4,11-1011.00,4.C.1.a.2.c,Public Speaking,CXP,4.0,7.79,37.0,4.3613,2.4093,22.4457,N,,08/2023,Incumbent


In [12]:
# Load the Work Activities.txt file
work_activities_data = pd.read_csv(DATA_PATH + "Work%20Activities.txt", sep="\t", encoding='latin1')

# Display the first few rows of the work activities data
print("Work Activities Data Overview:")
work_activities_data.head()

Work Activities Data Overview:


Unnamed: 0,O*NET-SOC Code,Element ID,Element Name,Scale ID,Data Value,N,Standard Error,Lower CI Bound,Upper CI Bound,Recommend Suppress,Not Relevant,Date,Domain Source
0,11-1011.00,4.A.1.a.1,Getting Information,IM,4.56,29.0,0.1559,4.2369,4.8756,N,,08/2023,Incumbent
1,11-1011.00,4.A.1.a.1,Getting Information,LV,4.89,30.0,0.1727,4.5393,5.2458,N,N,08/2023,Incumbent
2,11-1011.00,4.A.1.a.2,"Monitoring Processes, Materials, or Surroundings",IM,4.25,30.0,0.2125,3.813,4.6823,N,,08/2023,Incumbent
3,11-1011.00,4.A.1.a.2,"Monitoring Processes, Materials, or Surroundings",LV,5.21,30.0,0.3872,4.4133,5.9971,N,N,08/2023,Incumbent
4,11-1011.00,4.A.1.b.1,"Identifying Objects, Actions, and Events",IM,4.23,29.0,0.1544,3.918,4.5507,N,,08/2023,Incumbent



### Data Notes
- Each dataset includes a `O*NET-SOC Code` column to identify occupations.
- The `Element Name` column provides descriptive labels for skills, knowledge, etc.
- `Data Value` columns often represent importance or level ratings.
- `Skills` and `Knowledge` are listed with importance and level ratings (numeric values).

Next, we will merge and analyze these data to calculate job similarities.
