Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readme basic usage #16

Open
4 tasks
KadekM opened this issue Oct 25, 2016 · 6 comments
Open
4 tasks

Readme basic usage #16

KadekM opened this issue Oct 25, 2016 · 6 comments

Comments

@KadekM
Copy link
Owner

KadekM commented Oct 25, 2016

  • how to install it
  • how to create simple crawler
  • how to store result in file / json
  • how to compose crawlers
@yelled1
Copy link

yelled1 commented Sep 8, 2017

git pull the source
git clone git@github.com:KadekM/scrawler.git
** I used the sbt package to create jar files # this was the wrong move **
created a project
.
├── build.sbt (see below)
├── lib (*the compiled jar files go here ** not necessary & cause of the error **)
├── project
│   └── build.properties (specify the sbt version 0.13.15 in my case)
└── src
└── main
└── scala (*myCrawler.scala goes here)

ran intellij import by sbt
created a myCrawler.scala
copied class
and added
import com.marekkadek.scraper.Document
import com.marekkadek.scraper.jsoup.JsoupBrowser
import com.marekkadek.scrawler.crawlers.{Crawler, Visit, Yield, YieldData}
import fs2.{Strategy, Stream, Task} ** this solves two errors below **

However, I am stuck on 2 errors

  1. override protected def onDocument(document: Document): Stream[Task, Yield[String]] = {
    Task Takes Type Parameters
  2. Stream.emit(title) ++ Stream.emits(followableLinks)
    

Cannot resolve ++, emit, emits

Complete newbie myself. So, I am stuck here.

@yelled1
Copy link

yelled1 commented Sep 8, 2017

I was able to compile it after changing
import fs2._ from import fs2.Task
But run fails!

object WikiGo {
  def main(args: Array[String]) {
    val crawler = new myCrawler
    // crawl wikipedia sequentially and take 10 elements (titles of visited websites)
    val titles: Vector[String] = crawler.sequentialCrawl("https://wikipedia.org").take(10).runLog.unsafeRun
    println(titles)
}

[IJ]> compile
[success] Total time: 0 s, completed Sep 8, 2017 11:12:51 PM
[IJ]> run
[info] Running WikiGo
[error] (run-main-5) java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)V
java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)V
at com.marekkadek.scrawler.crawlers.Visit.(crawlers.scala:12)
at com.marekkadek.scrawler.crawlers.Crawler.sequentialCrawl(crawlers.scala:35)
at WikiGo$.main(WikiGo.scala:5)
at WikiGo.main(WikiGo.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
[trace] Stack trace suppressed: run last compile:run for the full output.
java.lang.RuntimeException: Nonzero exit code: 1
at scala.sys.package$.error(package.scala:27)
[trace] Stack trace suppressed: run last compile:run for the full output.
[error] (compile:run) Nonzero exit code: 1
[error] Total time: 0 s, completed Sep 8, 2017 11:12:57 PM
[IJ]>

@visox
Copy link
Collaborator

visox commented Sep 9, 2017

Hi,

you probably have a dependancy problem, (the code is fine)

can you share with us your build.sbt ?

also you created jar file for this project ? why not just add it as a dependancy in build.sbt?

An to your first comment, this project/crawler basically emits an fs2.stream and for that you typically need to import fs2.{Strategy, Stream, Task}

@yelled1
Copy link

yelled1 commented Sep 9, 2017

Hi:

Here's my build.sbt

name := "ScraperProject"

version := "1.1"

scalaVersion := "2.11.8"

libraryDependencies += "com.marekkadek" %% "scrawler" % "0.0.3"

Ah..., the jar files were somehow was causing problems!
I removed the lib directory (with jars) from the root dir & it ran fine.
I guess one cannot use libraryDependencies & jar files at the same time. My 1st Scala external lib dependent compiled & ran!
Thanks u much,

@visox
Copy link
Collaborator

visox commented Sep 9, 2017

Hi, no problem, happy crawling

@visox visox closed this as completed Sep 9, 2017
@visox visox reopened this Sep 9, 2017
@KadekM
Copy link
Owner Author

KadekM commented Sep 10, 2017

@yelled1 yes, just use sbt for dependency managment :)
feel free to open issue if you encounter any.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants