### Description

Sociodemographic data encompasses social, economic, and demographic characteristics of study participants that influence health outcomes and provide important context for interpreting health data. This includes information about ancestry and migration history, education, employment, household composition, living conditions, and socioeconomic status. These factors are essential for understanding health disparities, environmental exposures, and social determinants of health.

### Introduction

The Human Phenotype Project conducts comprehensive data collection through online surveys, where participants voluntarily provide information on various aspects influencing their health. This includes sociodemographic data, captured through two lifestyle surveys (Inital UKBB and Follow-up UKBB survey) and the Initial Medical Survey.

### Measurement protocol 
<!-- long measurment protocol for the data browser -->
These lifestyle surveys are modeled after the UK Biobank's touch screen questionnaire. Participants receive the full version via email to complete on an online platform, either before or after their baseline visit. A shorter, follow-up version of the questionnaire is then filled out by participants during subsequent visits. 

### Data availability 
<!-- for the example notebooks -->
The information is stored in 2 parquet files: `initial_medical.parquet`  and `ukbb.parquet` which contain the different data sources respectively the Initial Medical Survey and the  two lifestyle surveys (Inital UKBB and Follow-up UKBB survey).

### Summary of available data 
<!-- for the data browser -->
The sociodemographic dataset includes comprehensive information organized into several categories:

**Ancestry and Migration:**
- Participant's country of birth and year of immigration (aliya)
- Parents' countries of birth
- Grandparents' countries of birth (both maternal and paternal sides)

**Education and Qualifications:**
- Highest level of education attained
- Professional qualifications
- Age at completion of full-time education

**Employment:**
- Current employment status
- Duration in main current job (years and months)
- Length of working day
- Job characteristics (walking/standing, manual labor, shift work, night shifts)
- Commuting patterns (frequency, transport type, distance, duration)
- Remote work frequency

**Household and Living Conditions:**
- Type of accommodation
- Heating and air conditioning availability
- Length of residence at current address
- Household size and relationships between household members
- Number of vehicles in household
- Current and past pet ownership (type of pets)
- Residence area type (urban/rural)
- Assisted living status

**Socioeconomic Status:**
- Average total household income after tax
- Extra domestic job hours

**Family Connections:**
- Family members participating in the study (participant IDs and relationship degrees)
- Household members participating in the study

**Military Service (for Israeli population):**
- Past military service
- Position and duration of service

This comprehensive sociodemographic data enables researchers to examine how social and economic factors interact with health outcomes and to account for these variables in analyses.

### Relevant links

* [Pheno Knowledgebase](https://knowledgebase.pheno.ai/datasets/053-sociodemographics.html)
* [Pheno Data Browser](https://pheno-demo-app.vercel.app/folder/53)


In [1]:
#| echo: false
import pandas as pd
pd.set_option("display.max_rows", 500)

In [2]:
from pheno_utils import PhenoLoader

In [3]:
pl = PhenoLoader('sociodemographics')
pl

PhenoLoader for sociodemographics with
52 fields
3 tables: ['initial_medical', 'ukbb', 'age_sex']

# Data dictionary

In [4]:
pl.dict

Unnamed: 0_level_0,folder_id,feature_set,field_string,relative_location,bulk_file_extension,bulk_dictionary,description_string,field_type,pandas_dtype,stability,units,sampling_rate,strata,sexed,debut,completed,data_coding,array
tabular_field_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
collection_timestamp,53,initial_medical,Collection timestamp,sociodemographics/initial_medical.parquet,,,Timestamp of measurements collection,Datetime,"datetime64[ns, Asia/Jerusalem]",Accruing,Time,,Collection time,Both sexes,2/26/2020,,,Single
collection_date,53,initial_medical,Collection date,sociodemographics/initial_medical.parquet,,,Date of measurments collection,Date,datetime64[ns],Accruing,Time,,Collection time,Both sexes,2/26/2020,,,Single
timezone,53,initial_medical,Timezone,sociodemographics/initial_medical.parquet,,,Timezone of the measurments,Categorical (single),category,Accruing,,,Collection time,Both sexes,2/26/2020,,,Single
DOB,53,initial_medical,DOB,sociodemographics/initial_medical.parquet,,,Birth year,Integer,int,Accruing,year,,Primary,Both sexes,2/26/2020,,,Single
country_of_birth,53,initial_medical,Country of birth,sociodemographics/initial_medical.parquet,,,Birth Land,Categorical (single),int,Accruing,,,Primary,Both sexes,2/26/2020,,053_01,Single
year_of_aliya,53,initial_medical,Year of aliya,sociodemographics/initial_medical.parquet,,,Year of aliyah to Israel,Integer,int,Accruing,year,,Primary,Both sexes,2/26/2020,,,Single
father_country_of_birth,53,initial_medical,Father country of birth,sociodemographics/initial_medical.parquet,,,Grandfather's country of birth on father's side,Categorical (single),int,Accruing,,,Primary,Both sexes,2/26/2020,,053_01,Single
grandfather_country_of_birth_father_side,53,initial_medical,Grandfather country of birth father side,sociodemographics/initial_medical.parquet,,,Grandmother's country of birth on father's side,Categorical (single),int,Accruing,,,Primary,Both sexes,2/26/2020,,053_01,Single
grandmother_country_of_birth_father_side,53,initial_medical,Grandmother country of birth father side,sociodemographics/initial_medical.parquet,,,Grandmother's Birth Land,Categorical (single),int,Accruing,,,Primary,Both sexes,2/26/2020,,053_01,Single
mother_country_of_birth,53,initial_medical,Mother country of birth,sociodemographics/initial_medical.parquet,,,Country of your mother,Categorical (single),int,Accruing,,,Primary,Both sexes,2/26/2020,,053_01,Single
