Can't get sparklyr to connect with Shiny #1100

asodemann · 2017-10-31T14:49:58Z

I'm pretty new to Shiny and Spark.

I want to deploy a ShinyApp with a spark connection. Everything works how it should when I just hit RunApp, but whenever I try to publish it, I get the error: "Error in value[3L] :
SPARK_HOME directory '/usr/lib/spark' not found
Calls: local ... tryCatch -> tryCatchList -> tryCatchOne ->
Execution halted"

This directory exists on my cluster, so I'm not sure why it's not finding it.
Here's the code I'm trying to publish.

library(sparklyr)
library(shiny)
library(ggplot2)
library(rmarkdown)


Sys.setenv(SPARK_HOME = '/usr/lib/spark')
config<- spark_config()
spark_install(version = "2.2.0")
sc<-spark_connect(master = 'yarn-client',  version = '2.2.0')
tbl_cache(sc, 'output_final_v2')
output_tbl2<-tbl(sc, 'output_final_v2') 


ui <- fluidPage(
  
  textInput("name", "Enter Shortname", "company 1"),
  #selectInput("shortname", "Choose Name", choices = l),
  textInput("item_name", "Enter Item Name"),
  selectInput("month", "Choose Month", choice= c("January","February","March", "April", 
                                                 "May", "June", "July", "August", "September", 
                                                 "October", "November", "December")),
  selectInput("dow","Choose Day of Week", choice = c("Monday", "Tuesday", "Wednesday",
                                                     "Thursday", "Friday", "Saturday", "Sunday")),
  numericInput("count_customers", "Enter Number of Customers:", 2),
  numericInput("views", "Enter Number of Views", 30),
  
  
  plotOutput("plot1"),
  plotOutput("plot2"),
  plotOutput("plot3")
  
)

server <- function(input, output, session) {
  
  
  C2<-reactive( output_tbl2 %>%
                  mutate(views = input$views)%>%
                  filter(input$name == shortname)%>%
                  filter(input$dow== dow)%>%
                  filter (input$month == month)%>%
                  filter (input$item_name == item)%>%
                  filter (input$count_customers == count_customers)%>%
                  collect)
  output$plot1 <- renderPlot({
    
    p1<-ggplot2::ggplot(data = C2() , aes(x=price_per_customer, y=final_probability)) + geom_line() + ggtitle("Probability of Purchase") + labs(y="Probability",x= "Item Price")
    print(p1)
  })
  
  
  output$plot2 <- renderPlot({
    
    p2<-ggplot2::ggplot(data=C2(), aes(x=price_per_customer, y=((views*final_probability)*price_per_customer))) + geom_line() + geom_hline(aes(yintercept = max((views*final_probability)*price_per_customer))) + ggtitle("Projected Revenue") + labs(y="Expected Revenue",x="Item Price")  
    print(p2)
    
  })
  
  output$plot3<-renderPlot({
    
    p3<-ggplot2::ggplot(data=C2(), aes(x=price_per_customer))+ geom_line(aes(y=(views*final_probability)*price_per_customer)) + geom_line(aes(y= (((views*final_probability)/price_per_customer)))) + ggtitle("Iso-Profit vs Expected Volume")
    
    print(p3)  
  })
  
  
}

shinyApp(ui, server)

javierluraschi · 2017-10-31T17:27:28Z

@asodemann If Spark is already installed in the cluster, using Sys.setenv(SPARK_HOME = '/usr/lib/spark') should work; however, the error could be caused by permissions. Also, since Spark is already installed, then you don't need to run spark_install(version = "2.2.0") since this would be creating an alternate installation that we don't need.

asodemann · 2017-10-31T21:08:42Z

Thanks for responding! How do I change the permissions? I also tried this method: cluster_url <- system('cat /root/spark-ec2/cluster-url', intern=TRUE) and got permissions denied. When I added sudo to the code I got the error: no tty present and no askpass program specified @javierluraschi

javierluraschi · 2017-11-06T22:46:42Z

@asodemann by permissions I mean using CHMOD on the spark directory and adjust permissions for the user the shiny application is running as. Afterwards, you might also need to create a hadoop user for this user with something like the following:

hadoop dfs -mkdir /user/hadoop2
hadoop dfs -chown -R hadoop2 /user/hadoop2

You can also reach me out under http://gitter.im/rstudio/sparklyr for a more responsive answer.

javierluraschi · 2017-11-07T00:48:46Z

@asodemann here is a step-by-step guide... could you try getting this running and then modify the this base shiny app to match your current one?

Reference: hadoop dfs -mkdir /user/hadoop2
hadoop dfs -chown -R hadoop2 /user/hadoop2

Step 1 - Create EMR Cluster

For convenience, here is the bootstrap action:

s3://aws-bigdata-blog/artifacts/aws-blog-emr-rstudio-sparklyr/rstudio_sparklyr_emr5.sh

and

--sparklyr --rstudio --shiny --rstudio-url https://s3.amazonaws.com/rstudio-dailybuilds/rstudio-server-rhel-1.1.331-x86_64.rpm

Notice that I created a security group with access to all ports, as a test.

Step 2 - Validate RStudio and Shiny Server work

Navigate to RStudio:

http://ec2-54-184-5-###.us-west-2.compute.amazonaws.com:8787/

Navigate to Shiny:

http://ec2-54-184-5-###.us-west-2.compute.amazonaws.com:3838/
http://ec2-54-184-5-###.us-west-2.compute.amazonaws.com:3838/sample-apps/hello/

Both apps should load fine.

Step 3 - Create Shiny App that uses sparklyr from RStudio

Log in to RStudio, create a new shiny app. Then modify this to:

#
# This is a Shiny web application. You can run the application by clicking
# the 'Run App' button above.
#
# Find out more about building applications with Shiny here:
#
#    http://shiny.rstudio.com/
#

library(shiny)
library(sparklyr)
library(dplyr)

sc <- spark_connect(master = "yarn-client")
faithful_tbl <- copy_to(sc, faithful)

# Define UI for application that draws a histogram
ui <- fluidPage(
   
   # Application title
   titlePanel("Old Faithful Geyser Data"),
   
   # Sidebar with a slider input for number of bins 
   sidebarLayout(
      sidebarPanel(
         sliderInput("bins",
                     "Number of bins:",
                     min = 1,
                     max = 50,
                     value = 30)
      ),
      
      # Show a plot of the generated distribution
      mainPanel(
         plotOutput("distPlot")
      )
   )
)

# Define server logic required to draw a histogram
server <- function(input, output) {
   
   output$distPlot <- renderPlot({
      # generate bins based on input$bins from ui.R
      x    <- faithful_tbl %>% pull(waiting)
      bins <- seq(min(x), max(x), length.out = input$bins + 1)
      
      # draw the histogram with the specified number of bins
      hist(x, breaks = bins, col = 'darkgray', border = 'white')
   })
}

# Run the application 
shinyApp(ui = ui, server = server)

Notice that I've only changed these lines from the sample RStudio creates:

library(sparklyr)
library(dplyr)

sc <- spark_connect(master = "yarn-client")
faithful_tbl <- copy_to(sc, faithful)

and

x    <- faithful_tbl %>% pull(waiting)

Step 4 - Give access to shiny user in Hadoop

Using the RStudio terminal panel, run:

hadoop dfs -mkdir /user/shiny
hadoop dfs -chown -R shiny /user/shiny

Step 5 - Deploy Shiny app

In my case I ran from the RStudio terminal:

sudo mkdir /srv/shiny-server/sample-apps/sparklyr
sudo cp ~/sparklyr/app.R /srv/shiny-server/sample-apps/sparklyr/app.R

Navigate to:

http://ec2-54-184-5-###.us-west-2.compute.amazonaws.com:3838/sample-apps/sparklyr

asodemann · 2017-11-07T19:05:43Z

I was able to get it to work with these steps. Thank you!!! @javierluraschi

javierluraschi · 2017-11-07T21:21:24Z

@asodemann No problem, glad we could help! Closing.

JasmaB · 2018-04-21T03:52:52Z

@javierluraschi We have published a shiny app, but as we are dealing with big data; we need to execute copy_to command of sparklyr only once. Is there a way that we can save 'sc' and make app.r read from it instead of repeatedly making copy_to(sc, filename) command get executed each time the shiny app opens on the browser.

kevinykuo · 2018-04-21T13:59:20Z

@JasmaB is the "big data" being created by the shiny app user or is it being read from disk locally? If the data is on disk, it makes more sense for Spark to read it directly rather than R->copy_to().

cc @edgararuiz

JasmaB · 2018-04-23T09:48:36Z

Dear Kevin Kuo, Thank you for response. The data is being read from a remote server. We need to read the daily data from 2012 to 2018 and store locally, do a copy_to once and then make app.r analyze it. In this way, the waiting time on copy_to can be eliminated every time app.r is run online. We have used shiny server & nginx on Ubuntu to publish the shiny app. Kindly let us know how to proceed. Regards, Jasma Balasangameshwara

On Sat, 21 Apr 2018 at 19:29 Kevin Kuo ***@***.***> wrote: @JasmaB <https://github.com/JasmaB> is the "big data" being created by the shiny app user or is it being read from disk locally? If the data is on disk, it makes more sense for Spark to read it directly rather than R-> copy_to(). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1100 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/APM0DpLsRF5aD0l9FGawD6B2OKJS9ud5ks5tqzs6gaJpZM4QM3gR> .

-- Best, Jasma Balasangameshwara A good educator has an unpretentious desire to impart values believed to be of significance. - By Jasma

javierluraschi added the question label Oct 31, 2017

javierluraschi mentioned this issue Nov 7, 2017

Consider Shiny + sparklyr EMR walkthrough rstudio/spark.rstudio.com#68

Open

javierluraschi closed this as completed Nov 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't get sparklyr to connect with Shiny #1100

Can't get sparklyr to connect with Shiny #1100

asodemann commented Oct 31, 2017 •

edited

javierluraschi commented Oct 31, 2017

asodemann commented Oct 31, 2017 •

edited

javierluraschi commented Nov 6, 2017

javierluraschi commented Nov 7, 2017 •

edited

asodemann commented Nov 7, 2017

javierluraschi commented Nov 7, 2017

JasmaB commented Apr 21, 2018

kevinykuo commented Apr 21, 2018 •

edited

JasmaB commented Apr 23, 2018 via email

Can't get sparklyr to connect with Shiny #1100

Can't get sparklyr to connect with Shiny #1100

Comments

asodemann commented Oct 31, 2017 • edited

javierluraschi commented Oct 31, 2017

asodemann commented Oct 31, 2017 • edited

javierluraschi commented Nov 6, 2017

javierluraschi commented Nov 7, 2017 • edited

Step 1 - Create EMR Cluster

Step 2 - Validate RStudio and Shiny Server work

Step 3 - Create Shiny App that uses sparklyr from RStudio

Step 4 - Give access to shiny user in Hadoop

Step 5 - Deploy Shiny app

asodemann commented Nov 7, 2017

javierluraschi commented Nov 7, 2017

JasmaB commented Apr 21, 2018

kevinykuo commented Apr 21, 2018 • edited

JasmaB commented Apr 23, 2018 via email

asodemann commented Oct 31, 2017 •

edited

asodemann commented Oct 31, 2017 •

edited

javierluraschi commented Nov 7, 2017 •

edited

kevinykuo commented Apr 21, 2018 •

edited