# This is a preprocessing code for converting 119 raw CBCL items to apply factor analysis following Michelini (102 itmes for EFA) and Romer (60 items for CFA) in a MPlus input format

More detailed information can be found following references

- Michelini, G., Barch, D. M., Tian, Y., Watson, D., Klein, D. N., & Kotov, R. (2019). Delineating and validating higher-order dimensions of psychopathology in the Adolescent Brain Cognitive Development (ABCD) study. Translational psychiatry, 9(1), 261.

- Romer, A. L., & Pizzagalli, D. A. (2021). Is executive dysfunction a risk marker or consequence of psychopathology? A test of executive function as a prospective predictor and outcome of general psychopathology in the adolescent brain cognitive development study®. Developmental Cognitive Neuroscience, 51, 100994.

In [1]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import csv_to_mplus
import warnings

warnings.filterwarnings('ignore')

In [108]:
# Save path for saving factor analysis input file
save_path = "/users/hjd/IG_my_study/SNUH/data/Mplus/"
abcd_cbcl_path = "/data5/open_data/ABCD/3.0/abcd_cbcl01.txt"
snuh_cbcl_path = "/users/hjd/data/SNUH_CBCL.xlsx"
sgnn_sbj_path = "/users/hjw/data/ABCD/npz_files/sbj_keys.npz"

romer_cfa_original_path = "/users/hjd/IG_my_study/SNUH/data/Mplus/Romer_cfa_before_column_ordering.csv"
romer_cfa_ordered_path = "/users/hjd/IG_my_study/SNUH/data/Mplus/Romer_cfa.csv"

In [3]:
sgnn_sbj = ["NDAR_"+i for i in np.load(sgnn_sbj_path,allow_pickle=True)['X']]
len(sgnn_sbj)

6905

## Bring CBCL & remove columns
### Michelini [119 -> 102]

In [4]:
cbcl = pd.read_csv(abcd_cbcl_path,sep='\t',low_memory=False)
eventname = cbcl.loc[0,'eventname']
sbj_key = cbcl.loc[0,'subjectkey']
cbcl.columns= cbcl.loc[0,:]
cbcl.rename(columns={sbj_key:'subjectkey',eventname:'eventname'},inplace=True)
cbcl.set_index('subjectkey',inplace=True)
cbcl = cbcl.iloc[1:,8:-2]
cbcl

Unnamed: 0_level_0,Acts too young for his/her age Actúa como si fuera mucho menor que su edad,Drinks alcohol without parents' approval Toma bebidas alcohólicas sin permiso de los padres,Argues a lot Discute mucho,Fails to finish things he/she starts Deja sin terminar lo que él/ella empieza,There is very little he/she enjoys Disfruta de muy pocas cosas,Bowel movements outside toilet Hace sus necesidades en la ropa o en lugares inadecuados,"Bragging, boasting Es engreído(a), presumido(a)","Can't concentrate, can't pay attention for long No puede concentrarse o prestar atención por mucho tiempo","Can't get his/her mind off certain thoughts; obsessions Obsesiones, que quiere decir que no puede sacarse de la mente ciertos pensamientos","Can't sit still, restless, or hyperactive No puede quedarse quieto(a); es inquieto(a) o hiperactivo(a)",...,Unusually loud Más ruidoso(a) de lo común,Uses drugs for non medical purposes (don't include alcohol or tobacco) Usa drogas sin motivo médico (no incluya alcohol o tabaco),"Vandalism Comete actos de vandalismo, como romper ventanas u otras cosas",Wets self during the day Se orina en la ropa durante el día,Wets the bed Se orina en la cama,Whining Se queja mucho,Wishes to be of opposite sex Desea ser del sexo opuesto,"Withdrawn, doesn't get involved with others Se aísla, no se relaciona con los demás",Worries Se preocupa mucho,eventname
subjectkey,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
NDAR_INV3B78K6LY,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,baseline_year_1_arm_1
NDAR_INVN9EX2F1A,0,0,0,0,0,0,0,1,0,1,...,0,0,0,0,0,0,0,0,0,baseline_year_1_arm_1
NDAR_INV49PA9KKY,1,0,2,1,0,0,0,0,0,0,...,0,0,0,0,0,1,0,1,2,baseline_year_1_arm_1
NDAR_INV48RC3H1N,1,0,1,0,0,0,1,0,1,1,...,0,0,0,0,0,0,0,0,1,1_year_follow_up_y_arm_1
NDAR_INVN4DBED7D,0,0,1,1,0,0,2,1,2,2,...,1,0,0,0,0,1,0,0,1,2_year_follow_up_y_arm_1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
NDAR_INV1925AD9X,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1_year_follow_up_y_arm_1
NDAR_INV1JXDFV9Z,1,0,0,1,0,0,0,1,1,1,...,0,0,0,0,0,0,0,0,0,baseline_year_1_arm_1
NDAR_INVTHZ39BBJ,0,0,0,0,0,0,0,0,1,1,...,0,0,0,0,0,0,0,0,0,1_year_follow_up_y_arm_1
NDAR_INVTUL4PTMG,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2_year_follow_up_y_arm_1


In [5]:
#Use only baseline data & sbj used in previous study
year0 = cbcl.loc[cbcl['eventname']=="baseline_year_1_arm_1",:]
cbcl_year0 = year0.loc[[i for i in sgnn_sbj if i in year0.index],:]
cbcl_year0.shape

(6905, 120)

### Step 1. removed items which frequency was too low (<0.5% rated as 1 or 2)

The following CBCL items were removed because of low frequency:

    "Drinks alcohol without parents' approval”,
    “Sexual problems”, “Smokes, chews, or sniffs tobacco”,
    “Truancy, skips school”,
    “Uses drugs for non-medical purposes (don't include alcohol or tobacco)”. 

In [17]:
cbcl_col = cbcl_year0.columns.tolist()
remove_list = ["Drinks alcohol without parents' approval","Sexual problems",
               "Smokes, chews, or sniffs tobacco","Truancy, skips school",
    "Uses drugs for non medical purposes"]

for remove_col in remove_list:
    for origin_col in cbcl_col:
        if remove_col in origin_col:
            print(origin_col)
            cbcl_col.remove(origin_col)


Drinks alcohol without parents' approval Toma bebidas alcohólicas sin permiso de los padres
Sexual problems Problemas sexuales
Smokes, chews, or sniffs tobacco Fuma, masca o inhala tabaco
Truancy, skips school Falta a la escuela sin motivo
Uses drugs for non medical purposes (don't include alcohol or tobacco) Usa drogas sin motivo médico (no incluya alcohol o tabaco)


In [19]:
cbcl_col.remove("eventname")
cbcl_year0 = cbcl_year0.loc[:,cbcl_col].astype(float)

cbcl_year0.shape

(6905, 114)

### Step 2. Secondly, to address high inter-item correlations, which can distort factor structure, we aggregated items that were highly correlated (polychoric r>.75) into composites by averaging scores and then rounding to the nearest integer, thus preserving the trichotomous rating. 

The following composites were created:

    - Attacks/threatens (“Physically attacks people”, “Threatens people”);
    - Destroys (“Destroys his/her own things”,
            “Destroys things belonging to his/her family or others”, “Vandalism”);
    - Disobeys rules (“Disobedient at home”, “Disobedient at school”,
                “Breaks rules at home, school or elsewhere”);
    - Steals (“Steals at home”, “Steals outside the home”);
    - Peer problems (“Doesn't get along with other kids”, “Not liked by other kids”);
    - Distracted/Hyperactive (“Can't concentrate, can't pay attention for long”,
                “Inattentive or easily distracted”,
                “Can't sit still, restless, or hyperactive”);
    - Hallucinations (“Hears sound or voices that aren't there”,
                    “Sees things that aren't there”);
    - Sex play (“Plays with own sex parts in public”,
                “Plays with own sex parts too much”);
    - Weight problems (“Overeating”, “Overweight”).
    
    EFAs were run on the remaining 93 original and 9 composite items. 

In [20]:
cbcl_col_list = cbcl_year0.columns.tolist()

attack = ["Physically attacks people", "Threatens people"]
destroy = ["Destroys his/her own things", 
            "Destroys things belonging to his/her family or others", 
            "Vandalism"]
disobey = ["Disobedient at home",
            "Disobedient at school",
            "Breaks rules at home, school or elsewhere"]
steal = ["Steals at home", "Steals outside the home"]
peer = ["Doesn't get along with other kids", "Not liked by other kids"]
distract = ["Can't concentrate, can't pay attention for long",
            "Inattentive or easily distracted",
            "Can't sit still, restless, or hyperactive"]
hallu = ["Hears sound or voices that aren't there", "Sees things that aren't there"]
sex = ["Plays with own sex parts in public", "Plays with own sex parts too much"]
weight = ["Overeating", "Overweight"]
full = [attack,destroy,disobey,distract,hallu,peer,sex,steal,weight]

attack_cbcl =[]
destroy_cbcl = []
disobey_cbcl = []
steal_cbcl = []
peer_cbcl = []
distract_cbcl = []
hallu_cbcl = []
sex_cbcl = []
weight_cbcl = []
full_cbcl = [attack_cbcl,destroy_cbcl,disobey_cbcl,distract_cbcl,hallu_cbcl,
             peer_cbcl,sex_cbcl,steal_cbcl,weight_cbcl]

idx = 0
for comp in full:
    for item in comp:
        for item2 in cbcl_col_list:
            if item in item2:
                full_cbcl[idx].append(item2)
    idx+=1

In [21]:
#Remove those columns
for i in range(len(full)):
    assert len(full[i]) == len(full_cbcl[i])
for idx in range(len(full)):
    for item in full_cbcl[idx]:
        cbcl_col_list.remove(item)
len(cbcl_col_list)

93

In [22]:
cbcl_year0_efa = cbcl_year0[cbcl_col_list]

In [23]:
cbcl_year0_efa['Composite (Attachs/Threatens)'] = np.round(cbcl_year0[full_cbcl[0]].sum(1)/len(full_cbcl[0]),0)
cbcl_year0_efa['Composite (Destroys)'] = np.round(cbcl_year0[full_cbcl[1]].sum(1)/len(full_cbcl[1]),0)
cbcl_year0_efa['Composite (Disobeys rules)'] = np.round(cbcl_year0[full_cbcl[2]].sum(1)/len(full_cbcl[2]),0)
cbcl_year0_efa['Composite (Distracted/Hyperactive)'] = np.round(cbcl_year0[full_cbcl[3]].sum(1)/len(full_cbcl[3]),0)
cbcl_year0_efa['Composite (Hallucinations)'] = np.round(cbcl_year0[full_cbcl[4]].sum(1)/len(full_cbcl[4]),0)
cbcl_year0_efa['Composite (Peer problems)'] = np.round(cbcl_year0[full_cbcl[5]].sum(1)/len(full_cbcl[5]),0)
cbcl_year0_efa['Composite (Sex play)']= np.round(cbcl_year0[full_cbcl[6]].sum(1)/len(full_cbcl[6]),0)
cbcl_year0_efa['Composite (Steals)'] = np.round(cbcl_year0[full_cbcl[7]].sum(1)/len(full_cbcl[7]),0)
cbcl_year0_efa['Composite (Weight problems)'] = np.round(cbcl_year0[full_cbcl[8]].sum(1)/len(full_cbcl[8]),0)

cbcl_year0_efa.shape

(6905, 102)

## Rename columns (only leave English for convenience)

In [43]:
loadings = pd.read_excel("/users/hjd/data/ABCD_table/CBCL_loading_factors_from_michelini.xlsx",
                        engine='openpyxl',index_col=0, header=2)
new_col = loadings.index.tolist()[1:-1]
for idx,i in enumerate(new_col):
    new_col[idx]= new_col[idx].replace("high-strung","highstrung")
#     new_col[idx]= new_col[idx].replace("his/her","their")
#     new_col[idx]= new_col[idx].replace("he/she", "they")
#     new_col[idx]= new_col[idx].replace("him/her", "them")

len(new_col)

102

In [44]:
convert_col_dict = {key: value for key, value in dict.fromkeys(cbcl_year0_efa.columns).items()}

In [48]:
#Check unchanged columns for sure
remained_col = new_col.copy()
for i in convert_col_dict.keys():
    for j in new_col:
        if j in i:
            convert_col_dict[i] = j.replace("his/her","their").replace("he/she", "they").replace("him/her", "them")
            remained_col.remove(j)
remained_col

[]

In [49]:
cbcl_year0_efa.rename(columns=convert_col_dict,inplace=True)
cbcl_year0_efa

Unnamed: 0_level_0,Acts too young for their age,Argues a lot,Fails to finish things they starts,There is very little they enjoys,Bowel movements outside toilet,"Bragging, boasting",Can't get their mind off certain thoughts; obsessions,Clings to adults or too dependent,Complains of loneliness,Confused or seems to be in a fog,...,Worries,Composite (Attachs/Threatens),Composite (Destroys),Composite (Disobeys rules),Composite (Distracted/Hyperactive),Composite (Hallucinations),Composite (Peer problems),Composite (Sex play),Composite (Steals),Composite (Weight problems)
subjectkey,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
NDAR_INV003RTV85,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
NDAR_INV00BD7VDC,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
NDAR_INV00LH735Y,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
NDAR_INV00LJVZK2,0.0,1.0,1.0,0.0,0.0,1.0,2.0,0.0,1.0,0.0,...,0.0,0.0,1.0,2.0,2.0,0.0,0.0,0.0,0.0,0.0
NDAR_INV00R4TXET,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
NDAR_INVZZJ3A7BK,1.0,1.0,1.0,0.0,0.0,0.0,2.0,1.0,0.0,0.0,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0
NDAR_INVZZLZCKAY,0.0,2.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,...,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0
NDAR_INVZZNX6W2P,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
NDAR_INVZZZ2ALR6,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Romer [102 -> 60]

The following 10 items were excluded for cross-loading on more than one factor:

    “Secretive, keeps things to self”
    “Strange behavior,”
    “There is very little he/she enjoys,”
    “Unhappy, sad, or depressed,”
    “Unusually loud,”
    “Deliberatively harms self or attempts suicide,”
    “Feels or complains that no one loves him/her,”
    “Impulsive or acts without thinking,"
    “Talks about killing self,”
    “Overtired without good reason.” 

In [88]:
cross_loading = ["Secretive, keeps things to self",
    "Strange behavior",
    "There is very little they enjoys",
    "Unhappy, sad, or depressed",
    "Unusually loud",
    "Deliberately harms self or attempts suicide",
    "Feels or complains that no one loves them",
    "Impulsive or acts without thinking",
    "Talks about killing self",
    "Overtired without good reason"] 

In [95]:
cfa_col = cbcl_year0_efa.columns.tolist().copy()
print("Before removing : ",len(cfa_col))
print("======================")
for i in cross_loading:
    for j in cfa_col:
        if i in j:
            cfa_col.remove(j)
            print(j)
print("======================")
print("After removing : ",len(cfa_col))

Before removing :  102
Secretive, keeps things to self
Strange behavior
There is very little they enjoys
Unhappy, sad, or depressed
Unusually loud
Deliberately harms self or attempts suicide
Feels or complains that no one loves them
Impulsive or acts without thinking
Talks about killing self
Overtired without good reason
After removing :  92


The following 26 items were excluded for not loading on any factor:

    “Plays with own sex parts in public,”
    “Play with own sex parts too much,”
    “Overeating,”
    “Overweight,”
    “Hears sounds or voices that aren’t there,”
    “Sees things that aren’t there,”
    “Bowel movements outside toilet,”
    “Trouble sleeping,”
    “Wets self during the day,”
    “Wets the bed,”
    “Wishes to be of opposite sex,”
    “Clings to adult or too dependent,”
    “Cries a lot,”
    “Doesn’t eat well,”
    “Gets teased a lot,”
    “Bites fingernails,”
    “Nightmares,”
    “Constipated, doesn’t move bowels,”
    “Picks nose, skin, or other parts of body,”
    “Prefers being with older kids,”
    “Sleeps less than most kids,”
    “Sleeps more than most kids,”
    “Speech problem,”
    “Stores up too many things he/she doesn’t need,”
    “Talks or walks in sleep,”
    “Thumb-sucking.” 

In [96]:
not_loading = [
    "Plays with own sex parts in public",
    "Play with own sex parts too much",
    "Overeating",
    "Overweight",
    "Hears sound or voices that aren't there",
    "Sees things that aren’t there",
    "Bowel movements outside toilet",
    "Trouble sleeping",
    "Wets self during the day",
    "Wets the bed",
    "Wishes to be of opposite sex",
    "Clings to adults or too dependent",
    "Cries a lot",
    "Doesn't eat well",
    "Gets teased a lot",
    "Bites fingernails",
    "Nightmares",
    "Constipated, doesn't move bowels",
    "Picks nose, skin, or other parts of body",
    "Prefers being with older kids",
    "Sleeps less than most kids",
    "Sleeps more than most kids during day and/or night",
    "Speech problem",
    "Stores up too many things they doesn't need",
    "Talks or walks in sleep",
    "Thumb-sucking" ]

In [97]:
print("Before removing : ",len(cfa_col))
for i in not_loading:
    if i in cfa_col:
        cfa_col.remove(i)
print("After removing : ",len(cfa_col))

Before removing :  92
After removing :  72


Six composite scores were created (Michelini et al., 2019):

    Composite 1 (“Physically attacks people” + “Threatens people”);
    
    Composite 2 (“Disobedient at home” + “Disobedient at school” + “Breaks rules at home, school, or elsewhere”);
    
    Composite 3 (“Destroys his/her own things” + “Destroys things belonging to his/her family or others” + “Vandalism”);
    
    Composite 4 (“Steals at home” + “Steals outside the home”);
    
    Composite 5 (“Doesn’t get along with other kids” + “Not liked by other kids”);
    
    Composite 6 (“Can’t concentrate, can’t pay attention for long” + “Inattentive or easily distracted” + “Can’t sit still, restless, or hyperactive”).

In [98]:
print("Before removing : ",len(cfa_col))
cfa_col.remove('Composite (Hallucinations)')
cfa_col.remove('Composite (Sex play)')
cfa_col.remove('Composite (Weight problems)')
print("After removing : ",len(cfa_col))

Before removing :  72
After removing :  69


We removed 3 additional CBCL items for cross-loading on more than one factor

    (“Showing off or clowning,” “Strange ideas,” and “Talks too much”),
    
4 items that did not have enough variability

    (“Thinks about sex too much,” “Sets fires,” “Self-conscious or easily embarrassed,” and “Cruel to animals”),
    
and one item that resulted in a non-positive definite solution

    (“Other physical problems without known physical cause”) at one or more of the waves.

This left a total of 60 CBCL items/composites entered into the CFAs at each wave.

In [99]:
remove_further=["Showing off or clowning",
                "Strange ideas",
                "Talks too much",
               "Thinks about sex too much",
                "Sets fires",
                "Self-conscious or easily embarrassed",
                "Cruel to animals",
               'Other (physical problems without known physical cause)',
               'Stares blankly']

In [100]:
print("Before removing : ",len(cfa_col))
for i in remove_further:
    for j in cfa_col:
        if i in j:
            cfa_col.remove(j)
print("After removing : ",len(cfa_col))

Before removing :  69
After removing :  60


In [101]:
cbcl_year0_cfa = cbcl_year0_efa[cfa_col]
cbcl_year0_cfa.dropna(how='all',inplace=True)
cbcl_year0_cfa.shape

(6905, 60)

## Order columns for factor scores dimensions
### (Ext, Int, Nd, Som, Det)

In [109]:
romer_cfa_col_origin = pd.read_csv(romer_cfa_original_path,sep='\t',index_col=0).columns
romer_cfa_col_ordered = pd.read_csv(romer_cfa_ordered_path,sep='\t',index_col=0).columns

In [106]:
romer_cfa_col_ordered

Index(['Composite (Attachs/Threatens)',
       'Cruelty, bullying, or meanness to others',
       'Composite (Disobeys rules)', 'Gets in many fights',
       'Temper tantrums or hot temper', 'Argues a lot', 'Composite (Destroys)',
       'Screams a lot', 'Doesn't seem to feel guilty after misbehaving',
       'Swearing or obscene language', 'Teases a lot', 'Composite (Steals)',
       'Stubborn, sullen, or irritable', 'Lying or cheating',
       'Runs away from home', 'Sudden changes in mood or feelings',
       'Easily jealous', 'Composite (Peer problems)', 'Suspicious',
       'Demands a lot of attention',
       'Hangs around with others who get in trouble',
       'Feels others are out to get him/her', 'Sulks a lot',
       'Bragging, boasting', 'Whining', 'Too fearful or anxious', 'Worries',
       'Feels he/she has to be perfect', 'Feels too guilty',
       'Nervous, high-strung, or tense',
       'Fears he/she might think or do something bad',
       'Feels worthless or inferior

In [110]:
for i in range(len(cbcl_year0_cfa.columns)):
    print(romer_cfa_col_origin[i])
    print(cbcl_year0_cfa.columns[i])
    print("==========================")

Acts too young for his/her age
Acts too young for their age
Argues a lot
Argues a lot
Fails to finish things he/she starts
Fails to finish things they starts
Bragging, boasting
Bragging, boasting
Can't get his/her mind off certain thoughts; obsessions
Can't get their mind off certain thoughts; obsessions
Complains of loneliness
Complains of loneliness
Confused or seems to be in a fog
Confused or seems to be in a fog
Cruelty, bullying, or meanness to others
Cruelty, bullying, or meanness to others
Daydreams or gets lost in his/her thoughts
Daydreams or gets lost in their thoughts
Demands a lot of attention
Demands a lot of attention
Doesn't seem to feel guilty after misbehaving
Doesn't seem to feel guilty after misbehaving
Easily jealous
Easily jealous
Fears certain animals, situations, or places, other than school
Fears certain animals, situations, or places, other than school
Fears going to school
Fears going to school
Fears he/she might think or do something bad
Fears they might thin

In [111]:
cbcl_year0_cfa.columns = romer_cfa_col_origin
cbcl_year0_cfa = cbcl_year0_cfa[romer_cfa_col_ordered]

## SNUH data for cfa

In [29]:
snuh = pd.read_excel(snuh_cbcl_path, engine="openpyxl",index_col=0)
snuh = snuh.iloc[1:]
snuh.rename(columns={'r_no':'subjectkey'},inplace=True)
snuh.set_index('subjectkey',drop=True,inplace=True)
snuh.dropna(how='all',inplace=True)
snuh.columns = year0.columns

In [30]:
snuh.head()

Unnamed: 0_level_0,Acts too young for their age,Drinks alcohol without parents' approval Toma bebidas alcohólicas sin permiso de los padres,Argues a lot Discute mucho,Fails to finish things they start,There is very little they enjoy,Bowel movements outside toilet Hace sus necesidades en la ropa o en lugares inadecuados,"Bragging, boasting Es engreído(a), presumido(a)","Can't concentrate, can't pay attention for long No puede concentrarse o prestar atención por mucho tiempo",Can't get their mind off certain thoughts; obsessions,"Can't sit still, restless, or hyperactive No puede quedarse quieto(a); es inquieto(a) o hiperactivo(a)",...,Unusually loud Más ruidoso(a) de lo común,Uses drugs for non medical purposes (don't include alcohol or tobacco) Usa drogas sin motivo médico (no incluya alcohol o tabaco),"Vandalism Comete actos de vandalismo, como romper ventanas u otras cosas",Wets self during the day Se orina en la ropa durante el día,Wets the bed Se orina en la cama,Whining Se queja mucho,Wishes to be of opposite sex Desea ser del sexo opuesto,"Withdrawn, doesn't get involved with others Se aísla, no se relaciona con los demás",Worries Se preocupa mucho,eventname
subjectkey,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
P1,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
P2,0.0,1.0,2.0,1.0,0.0,0.0,0.0,1.0,2.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,
P3,1.0,,2.0,0.0,2.0,1.0,1.0,2.0,2.0,2.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,
P4,2.0,0.0,2.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
P5,2.0,0.0,2.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,


In [31]:
snuh_col = snuh.columns.tolist().copy()
snuh_col.remove('eventname')
for i in remove_list:
    snuh_col.remove(i)
for idx in range(len(full)):
    for item in full_cbcl[idx]:
        snuh_col.remove(item)

In [32]:
snuh_efa = snuh[snuh_col]
snuh_efa['Composite (Attachs/Threatens)'] = np.round(snuh[full_cbcl[0]].sum(1)/len(full_cbcl[0]),0)
snuh_efa['Composite (Destroys)'] = np.round(snuh[full_cbcl[1]].sum(1)/len(full_cbcl[1]),0)
snuh_efa['Composite (Disobeys rules)'] = np.round(snuh[full_cbcl[2]].sum(1)/len(full_cbcl[2]),0)
snuh_efa['Composite (Distracted/Hyperactive)'] = np.round(snuh[full_cbcl[3]].sum(1)/len(full_cbcl[3]),0)
snuh_efa['Composite (Hallucinations)'] = np.round(snuh[full_cbcl[4]].sum(1)/len(full_cbcl[4]),0)
snuh_efa['Composite (Peer problems)'] = np.round(snuh[full_cbcl[5]].sum(1)/len(full_cbcl[5]),0)
snuh_efa['Composite (Sex play)']= np.round(snuh[full_cbcl[6]].sum(1)/len(full_cbcl[6]),0)
snuh_efa['Composite (Steals)'] = np.round(snuh[full_cbcl[7]].sum(1)/len(full_cbcl[7]),0)
snuh_efa['Composite (Weight problems)'] = np.round(snuh[full_cbcl[8]].sum(1)/len(full_cbcl[8]),0)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_

In [33]:
snuh_cfa = snuh_efa[cfa_col]
snuh_cfa.columns = romer_cfa_col_origin
snuh_cfa = snuh_cfa[romer_cfa_col_ordered]

In [34]:
snuh_efa.shape, snuh_cfa.shape

((147, 102), (147, 60))

In [35]:
abcd_snuh_y0 = pd.concat([cbcl_year0_cfa, snuh_cfa])
abcd_snuh_y0.shape

In [36]:
abcd_snuh_y0.shape

((10131, 60), (9545, 60), (8868, 60), (5308, 60))

In [37]:
#Save processed files into MPlus input format using custom function

csv_to_mplus.convert(savepath=save_path,filename="ABCD_year0_cfa",
                     data=cbcl_year0_cfa,allow_nan=True)

csv_to_mplus.convert(savepath=save_path,filename="SNUH_cfa",
                     data=snuh_cfa,allow_nan=True)

csv_to_mplus.convert(savepath=save_path,filename="ABCD+SNUH_year0_cfa",
                     data=abcd_snuh_y0,allow_nan=True)