Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

str_extract fails to convert to integer #48

Closed
danielecook opened this issue Feb 16, 2015 · 3 comments

Comments

@danielecook
Copy link

commented Feb 16, 2015

Hello - thanks for developing bigrquery it's been a fantastic help. I'm having an issue extracting data from a column containing comma separated data.

mutate(df , PL_REF = as.integer(str_extract(N2_PL,"([0-9]+),[0-9]+,[0-9]+")))
Source: bigquery [201573431837:bamsurgeon]
From: variants3 [?? x 31]
Filter: call_method != "jointtest" 

Error: Encountered " <STRING_LITERAL> "\'([0-9]+),[0-9]+,[0-9]+\' "" at line 1, column 368.
Was expecting:
    ")" ...


query invalidQuery. Encountered " <STRING_LITERAL> "\'([0-9]+),[0-9]+,[0-9]+\' "" at line 1, column 368.
Was expecting:
    ")" ...

I'm able to exectute a working query in bigquery directly as such:

SELECT INTEGER(PL_REF) AS PL_REF, INTEGER(PL_HET) AS PL_HET, POW(INTEGER(PL_ALT),10) AS PL_ALT FROM 
   (SELECT 
       REGEXP_EXTRACT([N2_PL], '([0-9]+),[0-9]+,[0-9]+') AS [PL_REF],
       REGEXP_EXTRACT([N2_PL], '[0-9]+,([0-9]+),[0-9]+') AS [PL_HET],
       REGEXP_EXTRACT([N2_PL], '[0-9]+,[0-9]+,([0-9]+)') AS [PL_ALT]
    FROM [bamsurgeon.variants3])
LIMIT 10

Also note the power function for the third variable (PL_ALT), I was only able to execute this directly as well. bigrquery doesn't recognize ** or ^ from what I have seen, but does appear to recognize POW, although without being able to convert successfully to an integer it will not function.

Thanks!

@krlmlr

This comment has been minimized.

Copy link
Member

commented Feb 24, 2016

The following should work:

df %>%
  mutate(PL_REF_CHR = str_extract(N2_PL,"([0-9]+),[0-9]+,[0-9]+")) %>%
  mutate(PL_REF = as.integer(PL_REF_CHR))

I can replicate the problems with your first variant, and I think that they should be fixed. Schema-agnostic example:

tbl %>% mutate(b = as.integer(str_extract("123", ".(.).")))
tbl %>% mutate(b = str_extract("123", ".(.).")) %>% mutate(c = as.integer(b))
@hadley

This comment has been minimized.

Copy link
Member

commented Apr 18, 2017

Can you please provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you!

If you've never heard of a reprex before, start by reading "What is a reprex", and follow the advice further down the page.

@hadley

This comment has been minimized.

Copy link
Member

commented Apr 20, 2017

library(dplyr)
library(bigrquery)

con <- DBI::dbConnect(dbi_driver(),
  project = "bigrquery-examples",
  dataset = "test"
)
DBI::dbWriteTable(con, "test", data.frame(x = "a123"))

test <- tbl(con, "test")
test %>% 
  mutate(y = as.integer(str_extract(x, "([0-9]+)"))) %>%
  show_query()

DBI::dbRemoveTable(con, "test")

@hadley hadley closed this in 14e7612 Apr 20, 2017

Zsedo pushed a commit to Zsedo/bigrquery that referenced this issue Jun 26, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.