Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't get sparklyr to connect with Shiny #1100

Closed
asodemann opened this issue Oct 31, 2017 · 9 comments
Closed

Can't get sparklyr to connect with Shiny #1100

asodemann opened this issue Oct 31, 2017 · 9 comments
Labels

Comments

@asodemann
Copy link

asodemann commented Oct 31, 2017

I'm pretty new to Shiny and Spark.

I want to deploy a ShinyApp with a spark connection. Everything works how it should when I just hit RunApp, but whenever I try to publish it, I get the error: "Error in value[3L] :
SPARK_HOME directory '/usr/lib/spark' not found
Calls: local ... tryCatch -> tryCatchList -> tryCatchOne ->
Execution halted"

This directory exists on my cluster, so I'm not sure why it's not finding it.
Here's the code I'm trying to publish.

library(sparklyr)
library(shiny)
library(ggplot2)
library(rmarkdown)


Sys.setenv(SPARK_HOME = '/usr/lib/spark')
config<- spark_config()
spark_install(version = "2.2.0")
sc<-spark_connect(master = 'yarn-client',  version = '2.2.0')
tbl_cache(sc, 'output_final_v2')
output_tbl2<-tbl(sc, 'output_final_v2') 


ui <- fluidPage(
  
  textInput("name", "Enter Shortname", "company 1"),
  #selectInput("shortname", "Choose Name", choices = l),
  textInput("item_name", "Enter Item Name"),
  selectInput("month", "Choose Month", choice= c("January","February","March", "April", 
                                                 "May", "June", "July", "August", "September", 
                                                 "October", "November", "December")),
  selectInput("dow","Choose Day of Week", choice = c("Monday", "Tuesday", "Wednesday",
                                                     "Thursday", "Friday", "Saturday", "Sunday")),
  numericInput("count_customers", "Enter Number of Customers:", 2),
  numericInput("views", "Enter Number of Views", 30),
  
  
  plotOutput("plot1"),
  plotOutput("plot2"),
  plotOutput("plot3")
  
)

server <- function(input, output, session) {
  
  
  C2<-reactive( output_tbl2 %>%
                  mutate(views = input$views)%>%
                  filter(input$name == shortname)%>%
                  filter(input$dow== dow)%>%
                  filter (input$month == month)%>%
                  filter (input$item_name == item)%>%
                  filter (input$count_customers == count_customers)%>%
                  collect)
  output$plot1 <- renderPlot({
    
    p1<-ggplot2::ggplot(data = C2() , aes(x=price_per_customer, y=final_probability)) + geom_line() + ggtitle("Probability of Purchase") + labs(y="Probability",x= "Item Price")
    print(p1)
  })
  
  
  output$plot2 <- renderPlot({
    
    p2<-ggplot2::ggplot(data=C2(), aes(x=price_per_customer, y=((views*final_probability)*price_per_customer))) + geom_line() + geom_hline(aes(yintercept = max((views*final_probability)*price_per_customer))) + ggtitle("Projected Revenue") + labs(y="Expected Revenue",x="Item Price")  
    print(p2)
    
  })
  
  output$plot3<-renderPlot({
    
    p3<-ggplot2::ggplot(data=C2(), aes(x=price_per_customer))+ geom_line(aes(y=(views*final_probability)*price_per_customer)) + geom_line(aes(y= (((views*final_probability)/price_per_customer)))) + ggtitle("Iso-Profit vs Expected Volume")
    
    print(p3)  
  })
  
  
}

shinyApp(ui, server)
@javierluraschi
Copy link
Collaborator

@asodemann If Spark is already installed in the cluster, using Sys.setenv(SPARK_HOME = '/usr/lib/spark') should work; however, the error could be caused by permissions. Also, since Spark is already installed, then you don't need to run spark_install(version = "2.2.0") since this would be creating an alternate installation that we don't need.

@asodemann
Copy link
Author

asodemann commented Oct 31, 2017

Thanks for responding! How do I change the permissions? I also tried this method: cluster_url <- system('cat /root/spark-ec2/cluster-url', intern=TRUE) and got permissions denied. When I added sudo to the code I got the error: no tty present and no askpass program specified @javierluraschi

@javierluraschi
Copy link
Collaborator

@asodemann by permissions I mean using CHMOD on the spark directory and adjust permissions for the user the shiny application is running as. Afterwards, you might also need to create a hadoop user for this user with something like the following:

hadoop dfs -mkdir /user/hadoop2
hadoop dfs -chown -R hadoop2 /user/hadoop2

You can also reach me out under http://gitter.im/rstudio/sparklyr for a more responsive answer.

@javierluraschi
Copy link
Collaborator

javierluraschi commented Nov 7, 2017

@asodemann here is a step-by-step guide... could you try getting this running and then modify the this base shiny app to match your current one?

Reference: hadoop dfs -mkdir /user/hadoop2
hadoop dfs -chown -R hadoop2 /user/hadoop2

Step 1 - Create EMR Cluster

screen shot 2017-11-06 at 3 40 27 pm

screen shot 2017-11-06 at 3 40 39 pm

screen shot 2017-11-06 at 3 45 52 pm

For convenience, here is the bootstrap action:

s3://aws-bigdata-blog/artifacts/aws-blog-emr-rstudio-sparklyr/rstudio_sparklyr_emr5.sh

and

--sparklyr --rstudio --shiny --rstudio-url https://s3.amazonaws.com/rstudio-dailybuilds/rstudio-server-rhel-1.1.331-x86_64.rpm

screen shot 2017-11-06 at 3 42 56 pm

Notice that I created a security group with access to all ports, as a test.

Step 2 - Validate RStudio and Shiny Server work

Navigate to RStudio:

http://ec2-54-184-5-###.us-west-2.compute.amazonaws.com:8787/

Navigate to Shiny:

http://ec2-54-184-5-###.us-west-2.compute.amazonaws.com:3838/
http://ec2-54-184-5-###.us-west-2.compute.amazonaws.com:3838/sample-apps/hello/

Both apps should load fine.

Step 3 - Create Shiny App that uses sparklyr from RStudio

Log in to RStudio, create a new shiny app. Then modify this to:

#
# This is a Shiny web application. You can run the application by clicking
# the 'Run App' button above.
#
# Find out more about building applications with Shiny here:
#
#    http://shiny.rstudio.com/
#

library(shiny)
library(sparklyr)
library(dplyr)

sc <- spark_connect(master = "yarn-client")
faithful_tbl <- copy_to(sc, faithful)

# Define UI for application that draws a histogram
ui <- fluidPage(
   
   # Application title
   titlePanel("Old Faithful Geyser Data"),
   
   # Sidebar with a slider input for number of bins 
   sidebarLayout(
      sidebarPanel(
         sliderInput("bins",
                     "Number of bins:",
                     min = 1,
                     max = 50,
                     value = 30)
      ),
      
      # Show a plot of the generated distribution
      mainPanel(
         plotOutput("distPlot")
      )
   )
)

# Define server logic required to draw a histogram
server <- function(input, output) {
   
   output$distPlot <- renderPlot({
      # generate bins based on input$bins from ui.R
      x    <- faithful_tbl %>% pull(waiting)
      bins <- seq(min(x), max(x), length.out = input$bins + 1)
      
      # draw the histogram with the specified number of bins
      hist(x, breaks = bins, col = 'darkgray', border = 'white')
   })
}

# Run the application 
shinyApp(ui = ui, server = server)

Notice that I've only changed these lines from the sample RStudio creates:

library(sparklyr)
library(dplyr)

sc <- spark_connect(master = "yarn-client")
faithful_tbl <- copy_to(sc, faithful)

and

x    <- faithful_tbl %>% pull(waiting)

Step 4 - Give access to shiny user in Hadoop

Using the RStudio terminal panel, run:

hadoop dfs -mkdir /user/shiny
hadoop dfs -chown -R shiny /user/shiny

Step 5 - Deploy Shiny app

In my case I ran from the RStudio terminal:

sudo mkdir /srv/shiny-server/sample-apps/sparklyr
sudo cp ~/sparklyr/app.R /srv/shiny-server/sample-apps/sparklyr/app.R

Navigate to:

http://ec2-54-184-5-###.us-west-2.compute.amazonaws.com:3838/sample-apps/sparklyr

screen shot 2017-11-06 at 4 48 23 pm

@asodemann
Copy link
Author

I was able to get it to work with these steps. Thank you!!! @javierluraschi

@javierluraschi
Copy link
Collaborator

@asodemann No problem, glad we could help! Closing.

@JasmaB
Copy link

JasmaB commented Apr 21, 2018

@javierluraschi We have published a shiny app, but as we are dealing with big data; we need to execute copy_to command of sparklyr only once. Is there a way that we can save 'sc' and make app.r read from it instead of repeatedly making copy_to(sc, filename) command get executed each time the shiny app opens on the browser.

@kevinykuo
Copy link
Collaborator

kevinykuo commented Apr 21, 2018

@JasmaB is the "big data" being created by the shiny app user or is it being read from disk locally? If the data is on disk, it makes more sense for Spark to read it directly rather than R->copy_to().

cc @edgararuiz

@JasmaB
Copy link

JasmaB commented Apr 23, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants