-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix timestamp / -> - #2075
Merged
Merged
Fix timestamp / -> - #2075
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
180 changes: 102 additions & 78 deletions
180
qiita_db/metadata_template/test/test_sample_template.py
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
-- Feb 9, 2017 | ||
-- changing format of stored timestamps | ||
-- see python patch | ||
|
||
SELECT 1; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
from future.utils import viewitems | ||
from datetime import datetime | ||
|
||
from qiita_db.metadata_template.constants import ( | ||
SAMPLE_TEMPLATE_COLUMNS, PREP_TEMPLATE_COLUMNS, | ||
PREP_TEMPLATE_COLUMNS_TARGET_GENE) | ||
from qiita_db.metadata_template.prep_template import PrepTemplate | ||
from qiita_db.metadata_template.sample_template import SampleTemplate | ||
from qiita_db.sql_connection import TRN | ||
|
||
|
||
# getting columns in each info file that we need to check for | ||
cols_sample = [col | ||
for key, vals in viewitems(SAMPLE_TEMPLATE_COLUMNS) | ||
for col, dt in viewitems(vals.columns) if dt == datetime] | ||
cols_prep = [col | ||
for key, vals in viewitems(PREP_TEMPLATE_COLUMNS) | ||
for col, dt in viewitems(vals.columns) if dt == datetime].extend( | ||
[col | ||
for key, vals in viewitems(PREP_TEMPLATE_COLUMNS_TARGET_GENE) | ||
for col, dt in viewitems(vals.columns)]) | ||
|
||
|
||
def transform_date(value): | ||
# for the way the patches are applied we need to have this import and | ||
# the next 2 variables within this function | ||
from datetime import datetime | ||
|
||
# old format : new format | ||
formats = { | ||
# 4 digits year | ||
'%m/%d/%Y %H:%M:%S': '%Y-%m-%d %H:%M:%S', | ||
'%m-%d-%Y %H:%M': '%Y-%m-%d %H:%M', | ||
'%m/%d/%Y %H': '%Y-%m-%d %H', | ||
'%m-%d-%Y': '%Y-%m-%d', | ||
'%m-%Y': '%Y-%m', | ||
'%Y': '%Y', | ||
# 2 digits year | ||
'%m/%d/%y %H:%M:%S': '%Y-%m-%d %H:%M:%S', | ||
'%m-%d-%y %H:%M': '%Y-%m-%d %H:%M', | ||
'%m/%d/%y %H': '%Y-%m-%d %H', | ||
'%m-%d-%y': '%Y-%m-%d', | ||
'%m-%y': '%Y-%m', | ||
'%y': '%Y' | ||
} | ||
|
||
# loop over the old formats to see which one is it | ||
date = None | ||
for i, fmt in enumerate(formats): | ||
try: | ||
date = datetime.strptime(value, fmt) | ||
break | ||
except ValueError: | ||
pass | ||
if date is not None: | ||
value = date.strftime(formats[fmt]) | ||
return value | ||
|
||
if cols_sample: | ||
with TRN: | ||
# a few notes: just getting the preps with duplicated values; ignoring | ||
# column 'sample_id' and tables 'study_sample', 'prep_template', | ||
# 'prep_template_sample' | ||
sql = """SELECT table_name, array_agg(column_name::text) | ||
FROM information_schema.columns | ||
WHERE column_name IN %s | ||
AND table_name LIKE 'sample_%%' | ||
AND table_name NOT IN ( | ||
'prep_template', 'prep_template_sample') | ||
GROUP BY table_name""" | ||
# note that we are looking for those columns with duplicated names in | ||
# the headers | ||
TRN.add(sql, [tuple(set(cols_sample))]) | ||
for table, columns in viewitems(dict(TRN.execute_fetchindex())): | ||
# [1] the format is table_# so taking the # | ||
st = SampleTemplate(int(table.split('_')[1])) | ||
# getting just the columns of interest | ||
st_df = st.to_dataframe()[columns] | ||
# converting to datetime | ||
for col in columns: | ||
st_df[col] = st_df[col].apply(transform_date) | ||
st.update(st_df) | ||
|
||
if cols_prep: | ||
with TRN: | ||
# a few notes: just getting the preps with duplicated values; ignoring | ||
# column 'sample_id' and tables 'study_sample', 'prep_template', | ||
# 'prep_template_sample' | ||
sql = """SELECT table_name, array_agg(column_name::text) | ||
FROM information_schema.columns | ||
WHERE column_name IN %s | ||
AND table_name LIKE 'prep_%%' | ||
AND table_name NOT IN ( | ||
'prep_template', 'prep_template_sample') | ||
GROUP BY table_name""" | ||
# note that we are looking for those columns with duplicated names in | ||
# the headers | ||
TRN.add(sql, [tuple(set(cols_prep))]) | ||
for table, columns in viewitems(dict(TRN.execute_fetchindex())): | ||
# [1] the format is table_# so taking the # | ||
pt = PrepTemplate(int(table.split('_')[1])) | ||
# getting just the columns of interest | ||
pt_df = pt.to_dataframe()[columns] | ||
# converting to datetime | ||
for col in columns: | ||
pt_df[col] = pt_df[col].apply(transform_date) | ||
pt.update(pt_df) |
8 changes: 4 additions & 4 deletions
8
qiita_db/support_files/test_data/analysis/1_analysis_mapping_exp.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
#SampleID BarcodeSequence LinkerPrimerSequence center_name center_project_name emp_status experiment_center experiment_design_description experiment_title illumina_technology instrument_model library_construction_protocol pcr_primers platform run_center run_date run_prefix samp_size sample_center sequencing_meth study_center target_gene target_subfragment qiita_prep_id altitude anonymized_name assigned_from_geo collection_timestamp common_name country depth description_duplicate elevation env_biome env_feature has_extracted_data has_physical_specimen host_subject_id host_taxid latitude longitude ph physical_location samp_salinity sample_type season_environment taxon_id temp texture tot_nitro tot_org_carb water_content_soil qiita_study_title qiita_study_alias qiita_owner qiita_principal_investigator Description | ||
1.SKB8.640193 AGCGCTCACATC GTGCCAGCMGCCGCGGTAA ANL EMP ANL micro biome of soil and rhizosphere of cannabis plants from CA Cannabis Soil Microbiome MiSeq Illumina MiSeq This analysis was done as in Caporaso et al 2011 Genome research. The PCR primers (F515/R806) were developed against the V4 region of the 16S rRNA (both bacteria and archaea), which we determined would yield optimal community clustering with reads of this length using a procedure similar to that of ref. 15. [For reference, this primer pair amplifies the region 533_786 in the Escherichia coli strain 83972 sequence (greengenes accession no. prokMSA_id:470367).] The reverse PCR primer is barcoded with a 12-base error-correcting Golay code to facilitate multiplexing of up to 1,500 samples per lane, and both PCR primers contain sequencer adapter regions. FWD:GTGCCAGCMGCCGCGGTAA; REV:GGACTACHVGGGTWTCTAAT Illumina ANL 8/1/12 s_G1_L001_sequences .25,g ANL Sequencing by synthesis CCME 16S rRNA V4 1 0.0 SKB8 n 2011-11-11 13:00:00 root metagenome GAZ:United States of America 0.15 Burmese root 114.0 ENVO:Temperate grasslands, savannas, and shrubland biome ENVO:plant-associated habitat True True 1001:M7 3483 74.0894932572 65.3283470202 6.94 ANL 7.15 ENVO:soil winter 1118232 15.0 64.6 sand, 17.6 silt, 17.8 clay 1.41 5.0 0.164 Identification of the Microbiomes for Cannabis Soils Cannabis Soils Dude PIDude Cannabis Soil Microbiome | ||
1.SKD8.640184 TGAGTGGTCTGT GTGCCAGCMGCCGCGGTAA ANL EMP ANL micro biome of soil and rhizosphere of cannabis plants from CA Cannabis Soil Microbiome MiSeq Illumina MiSeq This analysis was done as in Caporaso et al 2011 Genome research. The PCR primers (F515/R806) were developed against the V4 region of the 16S rRNA (both bacteria and archaea), which we determined would yield optimal community clustering with reads of this length using a procedure similar to that of ref. 15. [For reference, this primer pair amplifies the region 533_786 in the Escherichia coli strain 83972 sequence (greengenes accession no. prokMSA_id:470367).] The reverse PCR primer is barcoded with a 12-base error-correcting Golay code to facilitate multiplexing of up to 1,500 samples per lane, and both PCR primers contain sequencer adapter regions. FWD:GTGCCAGCMGCCGCGGTAA; REV:GGACTACHVGGGTWTCTAAT Illumina ANL 8/1/12 s_G1_L001_sequences .25,g ANL Sequencing by synthesis CCME 16S rRNA V4 1 0.0 SKD8 n 2011-11-11 13:00:00 root metagenome GAZ:United States of America 0.15 Diesel Root 114.0 ENVO:Temperate grasslands, savannas, and shrubland biome ENVO:plant-associated habitat True True 1001:D9 3483 57.571893782 32.5563076447 6.8 ANL 7.1 ENVO:soil winter 1118232 15.0 66 sand, 16.3 silt, 17.7 clay 1.51 4.32 0.178 Identification of the Microbiomes for Cannabis Soils Cannabis Soils Dude PIDude Cannabis Soil Microbiome | ||
1.SKB7.640196 CGGCCTAAGTTC GTGCCAGCMGCCGCGGTAA ANL EMP ANL micro biome of soil and rhizosphere of cannabis plants from CA Cannabis Soil Microbiome MiSeq Illumina MiSeq This analysis was done as in Caporaso et al 2011 Genome research. The PCR primers (F515/R806) were developed against the V4 region of the 16S rRNA (both bacteria and archaea), which we determined would yield optimal community clustering with reads of this length using a procedure similar to that of ref. 15. [For reference, this primer pair amplifies the region 533_786 in the Escherichia coli strain 83972 sequence (greengenes accession no. prokMSA_id:470367).] The reverse PCR primer is barcoded with a 12-base error-correcting Golay code to facilitate multiplexing of up to 1,500 samples per lane, and both PCR primers contain sequencer adapter regions. FWD:GTGCCAGCMGCCGCGGTAA; REV:GGACTACHVGGGTWTCTAAT Illumina ANL 8/1/12 s_G1_L001_sequences .25,g ANL Sequencing by synthesis CCME 16S rRNA V4 1 0.0 SKB7 n 2011-11-11 13:00:00 root metagenome GAZ:United States of America 0.15 Burmese root 114.0 ENVO:Temperate grasslands, savannas, and shrubland biome ENVO:plant-associated habitat True True 1001:M8 3483 13.089194595 92.5274472082 6.94 ANL 7.15 ENVO:soil winter 1118232 15.0 64.6 sand, 17.6 silt, 17.8 clay 1.41 5.0 0.164 Identification of the Microbiomes for Cannabis Soils Cannabis Soils Dude PIDude Cannabis Soil Microbiome | ||
#SampleID BarcodeSequence LinkerPrimerSequence center_name emp_status experiment_center experiment_design_description experiment_title illumina_technology instrument_model library_construction_protocol pcr_primers platform run_center run_date run_prefix samp_size sample_center sequencing_meth study_center target_gene target_subfragment qiita_prep_id altitude anonymized_name assigned_from_geo collection_timestamp common_name country depth description_duplicate dna_extracted elevation env_biome env_feature host_subject_id host_taxid latitude longitude ph physical_specimen_location physical_specimen_remaining qiita_study_id samp_salinity sample_type scientific_name season_environment taxon_id temp texture tot_nitro tot_org_carb water_content_soil qiita_study_title qiita_study_alias qiita_owner qiita_principal_investigator Description | ||
1.SKB8.640193 AGCGCTCACATC GTGCCAGCMGCCGCGGTAA ANL EMP ANL micro biome of soil and rhizosphere of cannabis plants from CA Cannabis Soil Microbiome MiSeq Illumina MiSeq This analysis was done as in Caporaso et al 2011 Genome research. The PCR primers (F515/R806) were developed against the V4 region of the 16S rRNA (both bacteria and archaea), which we determined would yield optimal community clustering with reads of this length using a procedure similar to that of ref. 15. [For reference, this primer pair amplifies the region 533_786 in the Escherichia coli strain 83972 sequence (greengenes accession no. prokMSA_id:470367).] The reverse PCR primer is barcoded with a 12-base error-correcting Golay code to facilitate multiplexing of up to 1,500 samples per lane, and both PCR primers contain sequencer adapter regions. FWD:GTGCCAGCMGCCGCGGTAA; REV:GGACTACHVGGGTWTCTAAT Illumina ANL 8/1/12 s_G1_L001_sequences .25,g ANL Sequencing by synthesis CCME 16S rRNA V4 1 0 SKB8 n 2011-11-11 13:00:00 root metagenome GAZ:United States of America 0.15 Burmese root true 114 ENVO:Temperate grasslands, savannas, and shrubland biome ENVO:plant-associated habitat 1001:M7 3483 74.0894932572 65.3283470202 6.94 ANL true 1 7.15 ENVO:soil 1118232 winter 1118232 15 64.6 sand, 17.6 silt, 17.8 clay 1.41 5 0.164 Identification of the Microbiomes for Cannabis Soils Cannabis Soils Dude PIDude Cannabis Soil Microbiome | ||
1.SKD8.640184 TGAGTGGTCTGT GTGCCAGCMGCCGCGGTAA ANL EMP ANL micro biome of soil and rhizosphere of cannabis plants from CA Cannabis Soil Microbiome MiSeq Illumina MiSeq This analysis was done as in Caporaso et al 2011 Genome research. The PCR primers (F515/R806) were developed against the V4 region of the 16S rRNA (both bacteria and archaea), which we determined would yield optimal community clustering with reads of this length using a procedure similar to that of ref. 15. [For reference, this primer pair amplifies the region 533_786 in the Escherichia coli strain 83972 sequence (greengenes accession no. prokMSA_id:470367).] The reverse PCR primer is barcoded with a 12-base error-correcting Golay code to facilitate multiplexing of up to 1,500 samples per lane, and both PCR primers contain sequencer adapter regions. FWD:GTGCCAGCMGCCGCGGTAA; REV:GGACTACHVGGGTWTCTAAT Illumina ANL 8/1/12 s_G1_L001_sequences .25,g ANL Sequencing by synthesis CCME 16S rRNA V4 1 0 SKD8 n 2011-11-11 13:00:00 root metagenome GAZ:United States of America 0.15 Diesel Root true 114 ENVO:Temperate grasslands, savannas, and shrubland biome ENVO:plant-associated habitat 1001:D9 3483 57.571893782 32.5563076447 6.8 ANL true 1 7.1 ENVO:soil 1118232 winter 1118232 15 66 sand, 16.3 silt, 17.7 clay 1.51 4.32 0.178 Identification of the Microbiomes for Cannabis Soils Cannabis Soils Dude PIDude Cannabis Soil Microbiome | ||
1.SKB7.640196 CGGCCTAAGTTC GTGCCAGCMGCCGCGGTAA ANL EMP ANL micro biome of soil and rhizosphere of cannabis plants from CA Cannabis Soil Microbiome MiSeq Illumina MiSeq This analysis was done as in Caporaso et al 2011 Genome research. The PCR primers (F515/R806) were developed against the V4 region of the 16S rRNA (both bacteria and archaea), which we determined would yield optimal community clustering with reads of this length using a procedure similar to that of ref. 15. [For reference, this primer pair amplifies the region 533_786 in the Escherichia coli strain 83972 sequence (greengenes accession no. prokMSA_id:470367).] The reverse PCR primer is barcoded with a 12-base error-correcting Golay code to facilitate multiplexing of up to 1,500 samples per lane, and both PCR primers contain sequencer adapter regions. FWD:GTGCCAGCMGCCGCGGTAA; REV:GGACTACHVGGGTWTCTAAT Illumina ANL 8/1/12 s_G1_L001_sequences .25,g ANL Sequencing by synthesis CCME 16S rRNA V4 1 0 SKB7 n 2011-11-11 13:00:00 root metagenome GAZ:United States of America 0.15 Burmese root true 114 ENVO:Temperate grasslands, savannas, and shrubland biome ENVO:plant-associated habitat 1001:M8 3483 13.089194595 92.5274472082 6.94 ANL true 1 7.15 ENVO:soil 1118232 winter 1118232 15 64.6 sand, 17.6 silt, 17.8 clay 1.41 5 0.164 Identification of the Microbiomes for Cannabis Soils Cannabis Soils Dude PIDude Cannabis Soil Microbiome |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this block is rather confusing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any suggestions? Basically, is testing which is the fmt of the date within the old formats, if it's found it using the new format. Now, new format doesn't accept 2 digits so I'm simply reassigning to one with 4 digits if one of the 2 year digits is found ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. What about a format mapping:
{old_fmt: new_format}
, and in the case of unacceptable, just specifyNone
as the new format?