### Description

Population characteristics of the Human Phenotype Project study participants.

### Introduction  

Population characteristics are fundamental demographic data that provide essential context for all other measurements and analyses in the Human Phenotype Project. This basic information about participants includes their sex and date of birth, which are crucial for:

- Understanding the composition and representativeness of the study cohort
- Calculating age at time of measurement for longitudinal analyses
- Accounting for sex-specific biological differences in health outcomes
- Enabling demographic stratification in research analyses

The Human Phenotype Project maintains strict separation between personally identifiable information (contact details, full names) and research data, using unique participant IDs to link all measurements while protecting privacy.

### Measurement protocol 
<!-- long measurment protocol for the data browser -->
Upon registration to the Human Phenotype Project, people are assigned with a registration code, which is their ID in the study and provide a telephone number and email by which all communications are conducted. Participants are asked about their date of birth and their sex, and are asked to schedule a visit to the assessment center.

Personal and communication data is saved separately in a secure environment from the population characteristics information, which is saved with the participant designated ID.

### Data availability
 <!-- for the example notebooks -->
* population.parquet - contains sex, month and year of birth per participant

### Summary of available data 
<!-- for the data browser -->
1. Study ID - the ID of the study.
2. Date of birth - only month and year of birth should be available.
3. Sex of the participants.

### Relevant links

* [Pheno Knowledgebase](https://knowledgebase.pheno.ai/datasets/000-population.html)
* [Pheno Data Browser](https://pheno-demo-app.vercel.app/folder/1)

In [1]:
#| echo: false
import pandas as pd
pd.set_option("display.max_rows", 500)

In [2]:
from pheno_utils import PhenoLoader

In [3]:
pl = PhenoLoader('population')
pl

PhenoLoader for population with
4 fields
1 tables: ['population']

# Data dictionary

In [4]:
pl.dict

Unnamed: 0_level_0,field_string,description_string,folder_id,feature_set,field_type,strata,data_coding,array,pandas_dtype,bulk_file_extension,relative_location,units,bulk_dictionary,sampling_rate,transformation,list_of_tags,stability,sexed,debut,completed
tabular_field_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
month_of_birth,Month of birth,Month of birth,1,population,Categorical (single),Primary,042_03,Single,category,,population/population.parquet,,,,,,Accruing,Both sexes,2019-01-01,
year_of_birth,Year of birth,Year of birth,1,population,Integer,Primary,,Single,int,,population/population.parquet,,,,,,Accruing,Both sexes,2019-01-01,
sex,Sex,Sex,1,population,Categorical (single),Primary,9,Single,category,,population/population.parquet,,,,,,Accruing,Both sexes,2019-01-01,
study_id,Study ID,The study identifier,1,population,Categorical (single),Auxiliary,000_01,Single,category,,population/population.parquet,,,,,,Accruing,Both sexes,2019-01-01,
