Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not enough row in the data downloaded from http://www.ebi.ac.uk/gwas/api/search/downloads/alternative #1

Open
stankiewicz565 opened this issue Mar 30, 2021 · 1 comment

Comments

@stankiewicz565
Copy link
Owner

So I got the file alternative before going through readr it has 251402 lines. After I run read_tsv on the file theI get an error message which is seen below. I still get a tibble but with only 33944 rows and even adding the additional rows (1345) that could not be parsed due to the error. It still leaves us with only 35289 rows a far cry from 212000. I ran the Problems() function and have a tibble with the 1345 row that could not be parsed and the individual reasons as to why it could not be processed. The reason were the following :
"no trailing characters"
"a double"
"delimiter or quote"
"closing quote at end of file"
"38 columns"

#################################################################################
ERROR message:

gwascat_2021.03.30 = readr::read_tsv('alternative')

── Column specification ───────────────────────────────────────────────────
cols(
.default = col_character(),
DATE ADDED TO CATALOG = col_date(format = ""),
PUBMEDID = col_double(),
DATE = col_date(format = ""),
CHR_POS = col_double(),
UPSTREAM_GENE_DISTANCE = col_double(),
DOWNSTREAM_GENE_DISTANCE = col_double(),
MERGED = col_double(),
SNP_ID_CURRENT = col_double(),
INTERGENIC = col_double(),
P-VALUE = col_double(),
PVALUE_MLOG = col_double(),
OR or BETA = col_double()
)
Use spec() for the full column specifications.

Warning: 1345 parsing failures.
row col expected actual file
1199 SNP_ID_CURRENT no trailing characters 115231908-C 'alternative'
1200 SNP_ID_CURRENT no trailing characters 114539181-G 'alternative'
1201 SNP_ID_CURRENT no trailing characters 12938-C 'alternative'
1202 SNP_ID_CURRENT no trailing characters 10518889-A 'alternative'
1203 SNP_ID_CURRENT no trailing characters 114747981-C 'alternative'
.... .............. ...................... ........... .............
See problems(...) for more details.

@vjcitn
Copy link
Collaborator

vjcitn commented Mar 30, 2021

i think this was resolved via data.table::fread

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants