Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vCard/JSON extraction #19443

Closed
wants to merge 32 commits into from
Closed
Changes from 11 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
eb2c7e0
Merge pull request #11 from servo/master
niravjain Nov 13, 2017
015ef37
Merge pull request #12 from servo/master
CJ8664 Nov 22, 2017
43aeb7c
Added microdata module
niravjain Nov 24, 2017
f306835
Code to send msg from servo to servoshell (EmbedderMsg)
CJ8664 Nov 25, 2017
0530512
Merge branch 'master' of https://github.com/CJ8664/servo
CJ8664 Nov 25, 2017
f939632
naming convention
CJ8664 Nov 25, 2017
3d68d2a
trying serde_json crate
CJ8664 Nov 25, 2017
a4ed961
Uploading erroneous code
CJ8664 Nov 25, 2017
4628943
Uploading erroneous code
CJ8664 Nov 25, 2017
cbfc666
Merge remote-tracking branch 'refs/remotes/origin/serde_try' into ser…
CJ8664 Nov 25, 2017
146dd58
Merge pull request #13 from CJ8664/serde_try
CJ8664 Nov 25, 2017
9284187
serde json working for Hashmap
CJ8664 Nov 25, 2017
dd4dc70
Updated the servo-shell communication
CJ8664 Nov 29, 2017
43650c2
vCard working
CJ8664 Nov 29, 2017
19408a0
Adding adr to vCard
niravjain Nov 30, 2017
79d140d
Adding adr to vCard
niravjain Nov 30, 2017
aab6900
Merged vcard and json logic
CJ8664 Nov 30, 2017
35468f8
Fixed tidy errors
niravjain Nov 30, 2017
8cafd64
Updated code to pass the type of microdata as a parameter
CJ8664 Dec 1, 2017
87d1efa
Merge branch 'vcard' of https://github.com/CJ8664/servo into vcard
CJ8664 Dec 1, 2017
c580c3f
Added code to notify user via change in title
CJ8664 Dec 1, 2017
3cd9731
Created dummy test cases
CJ8664 Dec 1, 2017
d0cb12e
Removed JSON code and replaced with a stub
CJ8664 Dec 1, 2017
0ae2d08
Merge pull request #14 from CJ8664/vcard
CJ8664 Dec 1, 2017
7131823
Merge pull request #15 from servo/master
CJ8664 Dec 1, 2017
a3e5b9f
Updated manifest
CJ8664 Dec 1, 2017
0e5023d
Partially updated the code based on reviews
CJ8664 Dec 2, 2017
37f70c6
Fixed lint issues
CJ8664 Dec 2, 2017
543de1c
Fixed lint issues
CJ8664 Dec 2, 2017
a781bba
Merge fix
CJ8664 Dec 12, 2017
04a043d
Merge branch 'servo-master'
CJ8664 Dec 12, 2017
c740aab
Rebuild cargo
CJ8664 Dec 12, 2017
File filter...
Filter file types
Jump to…
Jump to file
Failed to load files.

Always

Just for now

Some generated files are not rendered by default. Learn more.

@@ -147,7 +147,7 @@ pub enum EmbedderMsg {
/// The load of a page has completed
LoadComplete(TopLevelBrowsingContextId),
/// Sends microdata

This comment has been minimized.

Copy link
@jdm

jdm Dec 1, 2017

Member

We should provide slightly more informative documentation here ;)

SendMicrodata(String),
SendMicrodata(String, String),
}

/// Messages from the painting thread and the constellation thread to the compositor thread.
@@ -193,6 +193,6 @@ pub trait WindowMethods {
/// run the event loop at the vsync interval.
fn set_animation_state(&self, _state: AnimationState) {}

/// Print Microdata on the Console
fn print_microdata(&self, _data: String) {}
/// Print Microdata on the Console or write to file
fn print_microdata(&self, _data: String, _datatype: String) {}

This comment has been minimized.

Copy link
@jdm

jdm Dec 1, 2017

Member

This doesn't actually have to print anything when implemented; we should call it microdata_available instead, and document what the arguments present.

This comment has been minimized.

Copy link
@CJ8664

CJ8664 Dec 2, 2017

Author

Actually, the name of the function is misleading we are actually writing the output to a file ( .vcf). We'll change the comment.

}
@@ -1321,8 +1321,8 @@ impl<Message, LTF, STF> Constellation<Message, LTF, STF>
FromScriptMsg::SetFullscreenState(state) => {
self.embedder_proxy.send(EmbedderMsg::SetFullscreenState(source_top_ctx_id, state));
}
FromScriptMsg::SendMicrodata(data) => {
self.embedder_proxy.send(EmbedderMsg::SendMicrodata(data));
FromScriptMsg::SendMicrodata(data, datatype) => {
self.embedder_proxy.send(EmbedderMsg::SendMicrodata(data, datatype));
}
}
}
@@ -82,6 +82,7 @@ script_plugins = {path = "../script_plugins"}
script_traits = {path = "../script_traits"}
selectors = { path = "../selectors" }
serde = "1.0"
serde_derive = "1.0"
serde_json = "1.0"
servo_allocator = {path = "../allocator"}
servo_arc = {path = "../servo_arc"}
@@ -1710,9 +1710,17 @@ impl Document {
// TODO: completely loaded.

// Step 13.

This comment has been minimized.

Copy link
@jdm

jdm Dec 1, 2017

Member

The comments here refer to actual step numbers defined in the HTML specification. We should not add ones for steps that do not exist :)

This comment has been minimized.

Copy link
@CJ8664

CJ8664 Dec 2, 2017

Author

The previous comments were already present, we just added step 13 for microdata as discussed our logic starts after page completes loading.

This comment has been minimized.

Copy link
@jdm

jdm Dec 2, 2017

Member

Right, but the comments including steps refer to the steps in https://html.spec.whatwg.org/multipage/#the-end . There is no step there about microdata; that's something we're adding that is not part of the specification.

let result = Microdata::parse(self);
let event = ScriptMsg::SendMicrodata(result);
self.send_to_constellation(event);
let htmlelement = self.get_html_element();
let result = Microdata::parse(self, htmlelement.unwrap().upcast::<Node>());
if !result.get("vcard").unwrap().is_empty() {
let event = ScriptMsg::SendMicrodata(result.get("vcard").unwrap().to_string(), "vcard".to_string());
self.send_to_constellation(event);
self.SetTitle(DOMString::from("Extracted vCard".to_string()));

This comment has been minimized.

Copy link
@jdm

jdm Dec 1, 2017

Member

We should not be setting the title in this code here. That should happen in the embedder instead.

} else if !result.get("json").unwrap().is_empty() {
let event = ScriptMsg::SendMicrodata(result.get("json").unwrap().to_string(), "json".to_string());
self.send_to_constellation(event);
self.SetTitle(DOMString::from("Extracted JSON".to_string()));
}
}

// https://html.spec.whatwg.org/multipage/#pending-parsing-blocking-script
@@ -923,6 +923,10 @@ impl Element {
&self.local_name
}

pub fn tag_name(&self) -> DOMString {
self.TagName()

This comment has been minimized.

Copy link
@jdm

jdm Dec 1, 2017

Member

You should be able to import ElementMethods and invoke TagName() directly from the caller.

This comment has been minimized.

Copy link
@vjhebbar

vjhebbar Dec 2, 2017

Contributor

This public method was added to test something, this will be removed.

}

pub fn parsed_name(&self, mut name: DOMString) -> LocalName {
if self.html_element_in_html_document() {
name.make_ascii_lowercase();
@@ -85,6 +85,8 @@ extern crate script_layout_interface;
extern crate script_traits;
extern crate selectors;
extern crate serde;
#[macro_use]
extern crate serde_derive;
extern crate serde_json;
extern crate servo_allocator;
extern crate servo_arc;
@@ -2,28 +2,160 @@
* License, v. 2.0. If a copy of the MPL was not distributed with this
* file, You can obtain one at http://mozilla.org/MPL/2.0/. */

use dom::bindings::codegen::Bindings::DocumentBinding::DocumentBinding::DocumentMethods;
use dom::bindings::codegen::Bindings::ElementBinding::ElementBinding::ElementMethods;
use dom::bindings::inheritance::Castable;
use dom::bindings::root::DomRoot;
use dom::characterdata::CharacterData;
use dom::document::Document;
use dom::element::Element;
use dom::htmlelement::HTMLElement;
use dom::node::Node;
use dom::text::Text;
use serde_json;
use std::borrow::Cow;
use std::collections::HashMap;

pub struct Microdata {}

This comment has been minimized.

Copy link
@jdm

jdm Dec 1, 2017

Member

Rather than an empty structure with static methods, we should use plain old functions in this file instead. Code in other modules can use microdata::parse instead.


impl Microdata {
//[Pref="dom.microdata.testing.enabled"]
pub fn parse() {
println!("Hello");
let mut book_reviews = HashMap::new();
let mut rating = HashMap::new();
pub fn parse(doc: &Document, node: &Node) -> HashMap<String, String> {

This comment has been minimized.

Copy link
@jdm

jdm Dec 1, 2017

Member

Rather than returning a hashmap with two known keys, we should return a richer data type - a structure that contains two Option fields would be more clear.

This comment has been minimized.

Copy link
@vjhebbar

vjhebbar Dec 2, 2017

Contributor

Would an enum be suitable here? Something like

enum Microdata {
JSON(some type),
vCard(some type)
}
Construct it as Microdata::JSON(jsonString) or Microdata::VCARD(vcardString) based on conditional checks and return it as Option and use a match statement to get appropriate representation at the client end.

This comment has been minimized.

Copy link
@jdm

jdm Dec 2, 2017

Member

That would be sensible!

let serialized_vcard = Self::parse_vcard(doc);
let serialized_json = Self::parse_json(node);
let mut serialized_data: HashMap<String, String> = HashMap::new();
serialized_data.insert("vcard".to_string(), serialized_vcard);
serialized_data.insert("json".to_string(), serialized_json);
return serialized_data;
}

pub fn parse_vcard(doc: &Document) -> String {
let ele = doc.upcast::<Node>();
let mut start_vcard = false;
let mut result: String = String::new();
let mut master_map: HashMap<String, HashMap<String, String>> = HashMap::new();
let mut master_key: String = String::new();

result += "BEGIN:VCARD\nPROFILE:VCARD\nVERSION:4.0\nSOURCE:";
result += doc.url().as_str();

let title = doc.Title();
if !title.is_empty() && !title.trim().is_empty() {
result += "\nNAME:";
result += title.trim();
}

result += "\n";

for element in ele.traverse_preorder().filter_map(DomRoot::downcast::<Element>) {

This comment has been minimized.

Copy link
@jdm

jdm Dec 1, 2017

Member

Please add comments like // Step 3.4 and // https://html.spec.whatwg.org/multipage/#concept-property-value as appropriate in these new methods to allow easily matching the implementation against the specification.

if element.is::<HTMLElement>() {
if element.has_attribute(&local_name!("itemtype")) {
let mut atoms = element.get_tokenlist_attribute(&local_name!("itemtype"), );
if !atoms.is_empty() {
let val = atoms.remove(0);
if val.trim() == "http://microformats.org/profile/hcard" {
if !start_vcard {
start_vcard = true;
} else {
break;
}
}
}
}
if start_vcard {

This comment has been minimized.

Copy link
@jdm

jdm Dec 1, 2017

Member

The implementation here doesn't appear to follow the algorithms defined in the specification. I would expect to see code that follows a pattern like this:

// https://html.spec.whatwg.org/multipage/#extracting-a-vcard
// Steps 1 & 2
let elements = ele.traverse_preorder().filter_map(DomRoot::downcast::<HTMLElement>);
let node = elements.find(|element| {
    element.Itemtypes().iter().find(|val| val == "http://microformats.org/profile/hcard").is_some())
});
let node = match node {
    Some(node) => node,
    None => /* return no vcard */,
};
/* prepare the vcard output */
// Step 11
for element in collect_properties_of_item(node) {
    for property in element.PropertyNames() {
        // Step 11.2
        let value = property_value(element, property);
        /* remaining steps */
    }
}
/* finish up vcard output */

Does that make sense?

let mut atoms = element.get_tokenlist_attribute(&local_name!("itemprop"), );
if !atoms.is_empty() {
let temp_key = atoms.remove(0);
if element.has_attribute(&local_name!("itemscope")) {
master_key = String::from(temp_key.trim()).to_owned();
let dup_master_key = Cow::Borrowed(&master_key);
master_map.entry(dup_master_key.to_string()).or_insert(HashMap::new());
} else {
let temp = String::from(temp_key.trim()).to_owned();
let dup_key = Cow::Borrowed(&temp);
let data = String::from(element.GetInnerHTML().unwrap());
let dup_master_key = Cow::Borrowed(&master_key);
let temp_map = master_map.entry(dup_master_key.to_string()).or_insert(HashMap::new());
temp_map.insert(dup_key.to_string(), String::from(data));
}
}
}
}
}
let vcard_parts = ["n", "org", "tel", "adr"];
for info_type in vcard_parts.iter() {
let detail_map_val = master_map.get(*info_type);
if detail_map_val.is_none() {
continue;
}
let detail_map = detail_map_val.unwrap();
match *info_type {
"n" => {
let mut n_value: String = String::new();

rating.insert("a", "1");
rating.insert("b", "2");
rating.insert("c", "3");
rating.insert("d", "4");
let name_parts = ["family-name", "given-name",
"additional-name", "honorific-prefix", "honorific-suffix"];
for part in name_parts.iter() {
if detail_map.contains_key(*part) {
n_value += format!("{};", detail_map.get(*part).unwrap()).as_str();
}
}
n_value.pop();

book_reviews.insert("Adventures of Huckleberry Finn", rating);
result += format!("{}:{}\n", info_type.to_ascii_uppercase(), n_value).as_str();
},
"org" => {
let mut org_value: String = String::new();

let j = serde_json::to_string(&book_reviews);
let org_parts = ["organization-name", "organization-unit"];
for part in org_parts.iter() {
if detail_map.contains_key(*part) {
org_value += format!("{};", detail_map.get(*part).unwrap()).as_str();
}
}
org_value.pop();

result += format!("{}:{}\n", info_type.to_ascii_uppercase(), org_value).as_str();
},
"tel" => {
let mut tel_value: String = String::new();

let tel_parts = ["value"];
for part in tel_parts.iter() {
if detail_map.contains_key(*part) {
tel_value += format!("{};", detail_map.get(*part).unwrap()).as_str();
}
}
tel_value.pop();

result += format!("{}:{}\n", info_type.to_ascii_uppercase(), tel_value).as_str();
},
"adr" => {
let mut adr_value: String = String::new();

let adr_parts = ["street-address", "locality", "region", "postal-code",
"country-name", "post-office-box", "extended-address"];
for part in adr_parts.iter() {
if detail_map.contains_key(*part) {
adr_value += format!("{};", detail_map.get(*part).unwrap()).as_str();
}
}
adr_value.pop();

result += format!("{}:{}\n", info_type.to_ascii_uppercase(), adr_value).as_str();
},
_ => {},
}
}
result += "END:VCARD";
if start_vcard {
return result;
} else {
return "".to_string();
}
}

// Print, write to a file, or send to an HTTP server.
println!("printing json from Microdata module");
println!("{:?}", j);
pub fn parse_json(node: &Node) -> String {
// TODO Write the logic for JSON Parsing
return "".to_string();
}
}
@@ -164,7 +164,7 @@ pub enum ScriptMsg {
/// Requests that the compositor shut down.
Exit,
/// Event to parse PrintMicrodata
SendMicrodata(String),
SendMicrodata(String, String),
}

/// Entities required to spawn service workers
@@ -463,8 +463,8 @@ impl<Window> Servo<Window> where Window: WindowMethods + 'static {
self.compositor.window.load_end(top_level_browsing_context);
},

(EmbedderMsg::SendMicrodata(data), ShutdownState::NotShuttingDown) => {
self.compositor.window.print_microdata(data);
(EmbedderMsg::SendMicrodata(data, datatype), ShutdownState::NotShuttingDown) => {
self.compositor.window.print_microdata(data, datatype);
},
}
}
{}
]
],
"mozilla/microdata/json.html": [
[
"/_mozilla/mozilla/microdata/json.html",
{}
]
],
"mozilla/microdata/none_check.html": [
[
"/_mozilla/mozilla/microdata/none_check.html",
{}
]
],
"mozilla/microdata/vcard.html": [
[
"/_mozilla/mozilla/microdata/vcard.html",
{}
]
],
"mozilla/mime_sniffing_font_context.html": [
[
"/_mozilla/mozilla/mime_sniffing_font_context.html",
"0a941e2eb10013ab55049d4d6007d8301149f651",
"testharness"
],
"mozilla/microdata/json.html": [
"d5e1649c88102f27da5fe1ac16715cfee7f70f84",
"testharness"
],
"mozilla/microdata/none_check.html": [
"5addb47e08bdb4ad8293f22dffe95e4760336e08",
"testharness"
"0d81eecf7e52feb0964e879ffe9450881141caa5",
"testharness"
],
"mozilla/microdata/vcard.html": [
"d5e1649c88102f27da5fe1ac16715cfee7f70f84",
"testharness"
],
"mozilla/mime_sniffing_font_context.html": [
"1311e72e0a0dafa6fae594ca1ce2deca164acd36",
"testharness"
@@ -0,0 +1,12 @@
<!doctype html>
<meta charset="utf-8">
<title>JSON</title>
<script src="/resources/testharness.js"></script>
<script src="/resources/testharnessreport.js"></script>

<script>
test(function () {
/* do something long and slow here */

This comment has been minimized.

Copy link
@jdm

jdm Dec 1, 2017

Member

Not sure what the long and slow means; if you want automated tests for this code, I recommend adding a test-only APIs to Document.webidl that return the result of extracting the microdata as a string that you can compare against expected results.

This comment has been minimized.

Copy link
@vjhebbar

vjhebbar Dec 2, 2017

Contributor

We will remove this test case. For now we plan to use a unix diff command to compare the expected and generated output.

assert_true(true);
}, "No operations");
</script>
@@ -0,0 +1,12 @@
<!doctype html>
<meta charset="utf-8">
<title>vCard</title>
<script src="/resources/testharness.js"></script>
<script src="/resources/testharnessreport.js"></script>

<script>
test(function () {
/* do something long and slow here */
assert_true(true);
}, "No operations");
</script>
ProTip! Use n and p to navigate between commits in a pull request.
You can’t perform that action at this time.