-
Notifications
You must be signed in to change notification settings - Fork 9
Description
I have an important doubt that no matter how much I read in the bibliography I cannot decipher it, it is about the issue of data normalization and transformation of abundances. I have a table of abundances of different genes annotated by pfam for 30 samples, which I have normalized by RPKG (reads per kilobase of genome). I have attached the metadata and used FlashWeave with the default values, so it applies CLR normalization:
network=learn_network(abundance_table , metadata_table , sensitive=true , heterogeneous=false , transposed=true , n_obs_min = threshold)
Is it wrong to normalize 2 times? Indeed, When we do RPKG are we normalizing an ending the compositionality problem or is it also necessary to apply CLR? RPKG serves for intersample normalization and I understand that CLR makes a transformation to work with the compositionality of the data. Can CLR be applied automatically in Flashweave after normalizing the pfam abundances? Or is it better to run flashweave normalize=false?
network=learn_network(abundance_table , metadata_table , sensitive=true , heterogeneous=false , transposed=true , n_obs_min = threshold, normalize=false ).
I have computed both networks and I am comparing them. The general metrics do not seem to change much, but when it comes to analyzing the communities and the role of the nodes (classifying them into connectors, peripherals, module hubs and networks hubs) the interpretation changes. Any ideas about it?
Thanks in advance, Maria