-
Notifications
You must be signed in to change notification settings - Fork 1
/
ReadMe.txt
92 lines (65 loc) · 3.21 KB
/
ReadMe.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
About the application :
-----------------------------------------
The application is all about harnessing the capabilities of below components to create a web crawler that can crawl any site.
So here we are mainly interested in getting the pin code of various areas/cities across India from a site called javataskforce.com.
This Application uses:
a) Apache CXF:
making rest/http request to remote site and fetch the page.
b) JPA/Hibernate with HSQLDB
using japa with hibernate to populate the HSQLDB with the information we are interested in
c) Apache Camel
coordinating between cxf/jpa, leveraging many task that requires a lot of coding,
creating routes that states what to do and when to do
Overview:
The Whole application is interconnected by camel routes, which are as follows (all routes are defined in camel-routes.xml)
a) states-invoke-route
i) invoke the crawler sevice to get state
ii) parse/process the states obtained
iii) put them in another Queue
iv) store the states into HSQLDB
b) districts-invoke-route
i) pool the DB( HsqlDb) and get the State with some particular status(INITIAL)
ii) update the status of State to (IN_PROCESS)
iii) invoke the crawler service to obtain districts for state
iv) parse/process the districts obtained
v) put them in another Queue
vi) store the districts into HSQLDB
vii) mark the state as processed (COMPLETED)
c) pincode-invoke-route
i) pool the DB( HsqlDb) and get the District with some status(INITIAL)
ii) update the status of District to (IN_PROCESS)
iii) invoke the crawler service to obtain areas/cities for the fetched District
iv) parse/process the areas/cities obtained
v) put them in another Queue
vi) store the area/city into HSQLDB
vii) mark the district as processed (COMPLETED)
In a nutshell there are some routes which invokes a crawl url , the result of which is parsed/processed and then
saved into the crawl Database
Configuring the application:
a) scheduling configuation:
a) open camel-routes.xml and change the cron in statement to a suitable time
quartz://myGroup/myTimerName?cron=30 06 00 03 08 ? 2016
b) open the camel-routing-policy.xml and schedule time for
i) districts-invoke-route
ii) pincode-route-policy
Running the application:
a) mvn clean install
b) run the HSQL Database
c) Go to hsqldb folder
on Windows:
i) click db_sart.bat
on Unix/Linux
i) sh db_start.sh
d) mvn exec:exec
this will execute com.javatask.Run
Monitoring :
a) Application can be monitored using Jconsole at port 86764( port can be changed by changing the jmx setting in pom.xml)
<argument>-Dcom.sun.management.jmxremote</argument>
<argument>-Dcom.sun.management.jmxremote.authenticate=false</argument>
<argument>-Dcom.sun.management.jmxremote.ssl=false</argument>
<argument>-Dcom.sun.management.jmxremote.port=8764</argument>
open Jsonsole and add
service:jmx:rmi:///jndi/rmi://127.0.0.1:8764/jmxrmi
b) Monitoring HSQL DB
- Windows Client (
execute db_client.bat available in hsqldb folder