Skip to content
This repository has been archived by the owner on Dec 22, 2021. It is now read-only.

Latest commit

 

History

History
183 lines (182 loc) · 8.95 KB

schema.md

File metadata and controls

183 lines (182 loc) · 8.95 KB
  • call_stack:

    • Type: String
    • Description: The call stack at the point when the function is called. The output is in the format: (function_name)(@)(javascript_source_file)(:)(line_number)(column_number)(new_line_character)
    • Example:
    jQuery.cookie@https://cdn.livechatinc.com/js/embedded.20171215135707.js:5:8393\nStore</s.get@https://cdn.livechatinc.com/js/embedded.20171215135707.js:8:3323\nStore</</s[p]@https://cdn.livechatinc.com/js/embedded.20171215135707.js:8:3746\nWindowsCommunicator.prototype.startCheckingForMainWindow/e<@https://cdn.livechatinc.com/js/embedded.20171215135707.js:10:11730
    
  • crawl_id:

    • Type: Integer
    • Description: Crawl_id appears to be the value 1 for all json files. It is possible this field was not used when generating the data using the crawler.
    • Example:
    1
    
  • func_name:

    • Type: String
    • Description: The name of the javascript function. Due to obfuscation the functions are often nonsensical and thus can be thought of as tokens. Anonymous functions will not have a name and the value will be an empty string.
    • Examples:
    ""
    a<4k
    getName
    
  • in_iframe:

    • Type: boolean
    • Description: in_iframe is a boolean that indicates that the javascript code was run inside of an iframe. This is new functionality that was added ontop of the origional OpenWPM repository.
  • location:

    • Type: string
    • Description: The url of the file that was being crawled to generate the json file. For iFrame resources, the location will be different that the parent url where the iFrame was encountered. For example, if Parent.html contains iFrame.html, iFrame.html is added inside an <iframe> tag. Inside iFrame.html a line of javascript such as: alert("window.location") is used to assert the location of content. When openWPM queries content that is inside iFrame.html which is found on Parent.html the location of the content is reported as: iFrame.html not Parent.html. Due to the paralellization of the crawl, the iFrame content can not be associated with the parent site on which it was encountered, only the in_iframe filed can indicate whether the content was executed inside an iFrame or not. All objects in a json file that were accessed from the crawled page outside of an iFrame should have the same location value. The url can be for any type of file such as .html, .js or have no file extension.
    • Examples:
    https://www.dresslily.com/bottom-c-36.html
    http://www.vidalfrance.com/component/forme/?fid=2
    
  • operation:

    • Type: string
    • Description: Corresponds to the "symbol" field. Operation is a call if the symbol is a method. Get/set operations get and set symbols that are properties with values.
    • Possible Values: get, call, set
  • script_col:

    • Type: string
    • Description: The column in the script_line where the function call starts. Note: currently some string do not contain numbers, but instead they contain urls such as the example bellow.
    • Examples:
    57
    211
    //hdjs.hiido.com/hiido_internal.js?siteid=mhssj
    
  • script_line:

    • Type: string
    • Description: The line in the file, indicated in the above location element, where the function call is located. Note: Currently some strings do not contain numbers, but instead they contain the protocol identifier for a url, such as in the example bellow.
    • Examples:
    12
    129
    http
    https
    
  • script_loc_eval:

    • Type: string
    • Description: If a function call is generated using the eval() function, or is created using new Function(), then the "script_loc_eval" value will be set. For example eval("console.log('my message')") or var log = new Function("message", "console.log(message)"); log("my message"); will both cause the "script_loc_evel" value be set when the function calls were collected. The format of "scipt_loc_eval" is: (line) (LINE_NUMBER) (>) (eval | Function) and can be repeated multiple times. Additional information on how the eval line number is generated can be found at the bottom of the MDN page which discusses the Error objects stack property. The "script_loc_eval" element is generated from this stack property.
    • Examples:
    ""
    line 2 > eval
    line 70 > Function
    line 140 > eval line 232 > Function
    line 1 > Function line 1 > eval line 1 > eval
    
  • script_url:

    • Type: string
    • Description: The url of the file where the javascript function call was run. This may be the same value at "location", or it may be an external web url that was loaded into the website with the use of the <script> tag.
    • Examples:
    http://www.google-analytics.com/analytics.js
    http://ajax.googleapis.com/ajax/libs/jquery/1.6/jquery.min.js
    http://pw.myersinfosys.com/javascripts/jquery-cookie.js?rwdv2
    https://g.alicdn.com/alilog/oneplus/blk.html#coid=52m7EjiWaj8CASPiP1nwaYXC&noid=&grd=n
    inline-cloudflare-rocketloader-executed-3.js
    /_/scs/shopping-verified-reviews-static/_/js/k=boq-shopping-verified-reviews.VerifiedReviewsBadgeUi.en_US.-JtwBcVsOWQ.O/m=_b,_tp/rt=j/d=1/excm=badgeview,_b,_tp/ed=1/rs=AC8lLkQbsBabKLQ4BgeJxo8BUz31aigxHA
    blob:http://nadgames.com/3334aa5f-24af-4c2f-9e52-fe196a0068b6
    
  • symbol:

    • Type: string
    • Description: Either a Web API interface property (with a value) or method (which may take args as listed in "arguments" field). Symbol corresponds to "operation" field.
    • Examples:
    window.Storage.getItem 
    window.navigator.userAgent
    CanvasRenderingContext2D.textBaseline
    
  • time_stamp:

    • Type: string
    • Description: The timestamp of when the javascript function information was collected. The timestamp is collected using Javascripts Date.now() function. It is in the format YYYY-MM-DDTHH:mm:ss.sssZ.
      • YYYY-MM-DD is the: year-month-day.
      • "T" is a delimiter to seperate the two sections.
      • HH:mm:ss.sss represents the: hours, minutes, seconds, and milliseconds.
      • Z is optional and denotes the time zone. Z represents the time zone UTC+0.
    • Examples:
    2017-12-16T00:17:37.973Z
    2017-12-16T00:24:09.355Z
    2017-12-16T08:10:24.749Z
    
  • value:

    • Type: string
    • Description: The value that the function returned.
    • Examples:
    ""
    {}
    Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0
    \_ga=GA1.2.1076416180.1513383458; \_gid=GA1.2.1940452730.1513383458
    {"name": "example", "Browser": "Mozilla/5.0"}
    
  • value_1000:

    • Type: string
    • Description: The value that the function returned, truncated to 1000 characters.
    • Examples:
    ""
    {}
    Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0
    \_ga=GA1.2.1076416180.1513383458; \_gid=GA1.2.1940452730.1513383458
    {"name": "example", "Browser": "Mozilla/5.0"}
    
  • value_len:

    • Type: Integer
    • Description: The number of characters in the string representation of the value a function returned
    • Examples:
    59508
    
  • arguments:

    • Type: object
    • Description: Optional property which lists the arguments taken by the method in "symbol" field.
    • Examples:
    {\"0\":\"liveAgentPc\"}
    {\"0\":\"liveAgentPage_0\",\"1\":\"http://www.alamy.com/help/what-is-model-release-property-release.aspx\"}
    
  • file_name:

    • Type: string
    • Description: Concatenation of the crawl_id and the name of a JSON file corresponding to a location in the raw collected data, format (crawl_id)_(JSON file name)
    • Examples:
    1_f001bb59462bc80ee8ec9e6592b571d0a465cf3e05665953e71b9fe9.json
    
  • call_id:

    • Type: string
    • Description: Concatenation of the file name and a row identifier to distinguish between different calls to the same file, format (file_name)__(identifier)
    • Examples:
    1_f001bb59462bc80ee8ec9e6592b571d0a465cf3e05665953e71b9fe9.json__121
    
  • arguments_i:

    • Type: string
    • Description: String representation of the argument passed to the function in position i, zero indexed.
    • Examples:
    ''
    {"domain":"backcountry.com"}
    None
    
  • arguments_n_keys:

    • Type: Integer
    • Description: The number of arguments in a function call
    • Examples:
    0
    5
    
  • valid:

    • Type: Boolean
    • Description: Whether the row returned a valid result during parsing
    • Examples:
    True
    
  • errors:

    • Type: string
    • Description: An error message if an error arised during row parsing
    • Examples: