# News-Stream Example Queries

*You need a username and a password to access the dashboard and the Solr index. Please ask us.*


In this notebook we will show some example queries, to give an idea and easy access to all the data in the News-Stream project.

Solr queries can be made with the Solr search page under 

http://hdp-node06.neofonie.de:8983/solr/#/hackathon_shard3_replica2/query .



There is also a Banana dashboard with plenty of prepared graphics and loaded data from the News-Stream system:

https://nstr.neofonie.de/dev/#/dashboard/solr/Hackathon .

You can use the inspector icon (i) above each chart to take a look at the Solr query that generated the chart and get inspired.




First we import some stuff we will need from python.


In [8]:
from itertools import chain
import urllib



## Querying Data from News-Stream



Please fill in the user id and the password for retrieving data from the News-Stream system. Both must be put in a 
file named `credentials.py` in the same directory where this notebook is hosted. You can use `example_credentials.py` as
a template for this file, or you can copy the code from here:

```python
dpa = { 'login' : '---fill--in--username---', 'password' : '---fill-in-password---' }
```

First of all some helper functions to make the requested prameters in the rest of the notebook more readable.


In [19]:
## Load credentials
try :
    from credentials import dpa as auth
except ImportError :
    raise RuntimeError("Credentials must be supplied as dict in credentials.py. See example_credentials.py or use this as a template: dpa=dict(login='user',password='secret')")

base="nstr.neofonie.de/solr-dev/hackathon/select?"
select = "https://"+auth['login']+":"+auth['password']+"@"+base
print("\nUsing as base url for News-Stream: " + base + "\n")

############################################################

default_params = { 'rows': '3', 'wt': 'json', 'indent': 'on'}

def enc_query(params):
    q = ''
    for k,v in params.items():
        q += str(k) + "=" + urllib.parse.quote_plus(str(v)) + "&"
    for def_k,def_v in default_params.items():
       if def_k not in params:
        q += str(def_k) + "=" + urllib.parse.quote_plus(str(def_v)) + "&"
    return q

def exec_query(query):
    encoded = enc_query(query)
    print("https://username:password@"+ base + encoded)
    !curl -k "{select + encoded}"



Using as base url for News-Stream: nstr.neofonie.de/solr-dev/hackathon/select?





## Examples Fetching Data with Search Words



All queries are accessible from the commandline via curl. 

All available fields are documented in the document in the github repository: 

[EnglishHowTohackathon](https://github.com/dpa-newslab/tickertools2016/blob/master/neofonie/EnglischHowToHackathon.md)


#### Searchword: "Hillary Clinton" - All Data


In [49]:

exec_query({'q': 'Hillary Clinton'})



#### Searchword: "Hillary Clinton AND Donald Trump" - All Data


In [21]:
exec_query({'q': 'Hillary Clinton OR Donald Trump'})


https://username:password@nstr.neofonie.de/solr-dev/hackathon/select?q=Hillary+Clinton+OR+Donals+Trump&wt=json&rows=3&indent=on&
{
  "responseHeader":{
    "status":0,
    "QTime":647,
    "params":{
      "q":"Hillary Clinton OR Donals Trump",
      "indent":"on",
      "rows":"3",
      "wt":"json"}},
  "response":{"numFound":43163,"start":0,"maxScore":0.45824558,"docs":[
      {
        "sourceId":"neofonie",
        "entityLabels":["Rap",
          "Jay-Z",
          "Politische Kampagne",
          "Staat",
          "Florida",
          "Donald Trump",
          "Hillary Clinton",
          "Miami"],
        "neoTeaserGenerated":false,
        "neoDocId":"3847959807451155506",
        "neoApplication":"rponline",
        "language":"de",
        "neoPublicationId":99,
        "title":"Trump vergleicht sich mit Rapper Jay-Z",
        "neoUrl":"http://www.rp-online.de/politik/ausland/donald-trump-vergleicht-sich-mit-rapper-jay-z-aid-1.6376536",
        "entityRfc4180":["k,26,32,GEN


#### Searchword: "Hillary Clinton" AND "Donald Trump" - Just title and text


In [22]:
exec_query(
        {
            'q': '"Hillary Clinton" AND "Donald Trump"', 
            'fl': 'title AND text',
        })


https://username:password@nstr.neofonie.de/solr-dev/hackathon/select?q=%22Hillary+Clinton%22+AND+%22Donald+Trump%22&fl=title+AND+text&wt=json&rows=3&indent=on&
{
  "responseHeader":{
    "status":0,
    "QTime":500,
    "params":{
      "q":"\"Hillary Clinton\" AND \"Donald Trump\"",
      "indent":"on",
      "fl":"title AND text",
      "rows":"3",
      "wt":"json"}},
  "response":{"numFound":20211,"start":0,"maxScore":2.077994,"docs":[
      {
        "title":"LA Times Tracking Poll: Trump Up 5 Points",
        "text":" Donald Trump holds a 5-point lead over Hillary Clinton, the Los Angeles Times Daybreak tracking poll showed Thursday. Donald Trump, 47.5 percent. Hillary Clinton, 42.5 percent.   © 2016 Newsmax. All rights reserved. Click Here to comment on this article"},
      {
        "title":"Trump vergleicht sich mit Rapper Jay-Z",
        "text":"Die US-Präsidentschaftskandidaten Hillary Clinton und Donald Trump haben ihren Wahlkampf am Samstag im hart umkämpften Staat Florid


#### Searchword: "Hillary Clinton" AND "Donald Trump" -  Titles only for articles in english language.


In [23]:
exec_query(
        {
            'q': '"Hillary Clinton" AND "Donald Trump"',
            'fq': 'language: en AND sourceId:neofonie',
            'fl': 'title',
            'sort': 'publicationDate DESC',
            'rows': '10'
        })


https://username:password@nstr.neofonie.de/solr-dev/hackathon/select?rows=10&q=%22Hillary+Clinton%22+AND+%22Donald+Trump%22&sort=publicationDate+DESC&fl=title&fq=language%3A+en+AND+sourceId%3Aneofonie&wt=json&indent=on&
{
  "responseHeader":{
    "status":0,
    "QTime":506,
    "params":{
      "q":"\"Hillary Clinton\" AND \"Donald Trump\"",
      "indent":"on",
      "fl":"title",
      "sort":"publicationDate DESC",
      "fq":"language: en AND sourceId:neofonie",
      "rows":"10",
      "wt":"json"}},
  "response":{"numFound":5978,"start":0,"docs":[
      {
        "title":"Transition: Obama, Trump to meet at White House"},
      {
        "title":"The West Wing creator Aaron Sorkin writes emotional letter to his daughter after Trump's victory"},
      {
        "title":"Michael Moore posts plan to save America to Facebook"},
      {
        "title":"'Not my president:' Trump denounced in protests across US"},
      {
        "title":"How Hillary Clinton spent almost twice as much


#### Using Meta Information and some semantics of Solr search queries

In the next queries we are setting the number of results to zero, because we are just interested in the meta information.

For each of the following three examples, we find a different number of results depending on the semantic of the seach query.

* In the first example the query string is OR'ed and we get all results containing any occurrence of the query tokens.
* In the second example the semantics of the query is interpreted by Solr ("text:hillary +text:clinton +text:donald text:trump").
* In the third query we are searching for exact matches of "Hillary Clinton" AND "Donald Trump".

Most of the time you want the third query for results that match both politicians.


In [24]:
exec_query(
        {
            'q': 'Hillary Clinton Donald Trump', 
            'rows': '0'
        })


https://username:password@nstr.neofonie.de/solr-dev/hackathon/select?rows=0&q=Hillary+Clinton+Donald+Trump&wt=json&indent=on&
{
  "responseHeader":{
    "status":0,
    "QTime":210,
    "params":{
      "q":"Hillary Clinton Donald Trump",
      "indent":"on",
      "rows":"0",
      "wt":"json"}},
  "response":{"numFound":44080,"start":0,"maxScore":1.4701442,"docs":[]
  }}


In [25]:
exec_query(
        {
            'q': 'Hillary Clinton AND Donald Trump', 
            'rows': '0'
        })


https://username:password@nstr.neofonie.de/solr-dev/hackathon/select?rows=0&q=Hillary+Clinton+AND+Donald+Trump&wt=json&indent=on&
{
  "responseHeader":{
    "status":0,
    "QTime":146,
    "params":{
      "q":"Hillary Clinton AND Donald Trump",
      "indent":"on",
      "rows":"0",
      "wt":"json"}},
  "response":{"numFound":23408,"start":0,"maxScore":1.470185,"docs":[]
  }}


In [26]:
exec_query(
        {
            'q': '"Hillary Clinton" AND "Donald Trump"', 
            'rows': '0'
        })


https://username:password@nstr.neofonie.de/solr-dev/hackathon/select?rows=0&q=%22Hillary+Clinton%22+AND+%22Donald+Trump%22&wt=json&indent=on&
{
  "responseHeader":{
    "status":0,
    "QTime":258,
    "params":{
      "q":"\"Hillary Clinton\" AND \"Donald Trump\"",
      "indent":"on",
      "rows":"0",
      "wt":"json"}},
  "response":{"numFound":20221,"start":0,"maxScore":2.0788512,"docs":[]
  }}



#### Documents about "Washington" from Neofonie's news crawl not older than 24 hours


The following query returns results for all news articles containing the search term 'Washington'.

Results contain terms like 'Kamasi Washington', as 'Washington Redskins' etc.

In [27]:
exec_query(
        {
            'q': 'Washington', 
            'fq': '+sourceId:neofonie +publicationDateNOW/HOUR-24HOUR TO NOW/HOUR+1HOUR'
        })


https://username:password@nstr.neofonie.de/solr-dev/hackathon/select?q=Washington&fq=%2BsourceId%3Aneofonie+%2BpublicationDateNOW%2FHOUR-24HOUR+TO+NOW%2FHOUR%2B1HOUR&wt=json&rows=3&indent=on&
{
  "responseHeader":{
    "status":0,
    "QTime":853,
    "params":{
      "q":"Washington",
      "indent":"on",
      "fq":"+sourceId:neofonie +publicationDateNOW/HOUR-24HOUR TO NOW/HOUR+1HOUR",
      "rows":"3",
      "wt":"json"}},
  "response":{"numFound":507,"start":0,"maxScore":0.46183452,"docs":[
      {
        "sourceId":"neofonie",
        "entityLabels":["1433",
          "Washington",
          "7th District Inc.",
          "Florida",
          "Ohio",
          "Gas",
          "Boost",
          "Seattle",
          "Boosting",
          "SeaTac",
          "Nick Hanauer",
          "Palo Alto",
          "Fairness",
          "Arizona",
          "Colorado",
          "Maine",
          "Snohomish County",
          "Restaurant",
          "Retail",
          "King County",
    


Whereas the following search narrows the search down to all articles containing the entity with label 'Washington', which might match your initial intention of searching for the US capital in news.

Please see the next chapter for more examples using named entities.


In [28]:
exec_query(
        {
            'q': 'entityLabels: Washington', 
            'fq': '+sourceId:neofonie +publicationDateNOW/HOUR-24HOUR TO NOW/HOUR+1HOUR'
        })


https://username:password@nstr.neofonie.de/solr-dev/hackathon/select?q=entityLabels%3A+Washington&fq=%2BsourceId%3Aneofonie+%2BpublicationDateNOW%2FHOUR-24HOUR+TO+NOW%2FHOUR%2B1HOUR&wt=json&rows=3&indent=on&
{
  "responseHeader":{
    "status":0,
    "QTime":379,
    "params":{
      "q":"entityLabels: Washington",
      "indent":"on",
      "fq":"+sourceId:neofonie +publicationDateNOW/HOUR-24HOUR TO NOW/HOUR+1HOUR",
      "rows":"3",
      "wt":"json"}},
  "response":{"numFound":17,"start":0,"maxScore":7.572799,"docs":[
      {
        "sourceId":"neofonie",
        "entityLabels":["Amide",
          "Bremerton",
          "Roger Goodell",
          "National Football League",
          "Washington",
          "Centers for Disease Control and Prevention",
          "Bellingham",
          "Facebook",
          "Doctors",
          "Seattle",
          "Desoxyribonukleinsäure",
          "Informationstechnik",
          "Highschool",
          "Donald Trump",
          "Florida",
     



#### Hourly Documents Count about "Hillary Clinton" from Neofonie's news crawl not older than 24 hours: 


In [29]:
exec_query(
        {
            'q': 'entityLabels: Hillary Clinton', 
            'fq': '+publicationDate:[NOW/HOUR-24HOUR TO NOW/HOUR+1HOUR] +sourceId:neofonie',
            'rows': '0',
            'facet': 'true',
            'facet.range': 'publicationDate',
            'facet.range.start': 'NOW/HOUR-24HOUR',
            'facet.range.end': 'NOW/HOUR+1HOUR',
            'facet.range.gap': '+1HOUR'
        })


https://username:password@nstr.neofonie.de/solr-dev/hackathon/select?facet=true&rows=0&q=entityLabels%3A+Hillary+Clinton&facet.range.end=NOW%2FHOUR%2B1HOUR&facet.range=publicationDate&facet.range.start=NOW%2FHOUR-24HOUR&facet.range.gap=%2B1HOUR&fq=%2BpublicationDate%3A%5BNOW%2FHOUR-24HOUR+TO+NOW%2FHOUR%2B1HOUR%5D+%2BsourceId%3Aneofonie&wt=json&indent=on&
{
  "responseHeader":{
    "status":0,
    "QTime":181,
    "params":{
      "facet.range":"publicationDate",
      "q":"entityLabels: Hillary Clinton",
      "facet.range.gap":"+1HOUR",
      "indent":"on",
      "fq":"+publicationDate:[NOW/HOUR-24HOUR TO NOW/HOUR+1HOUR] +sourceId:neofonie",
      "rows":"0",
      "facet":"true",
      "wt":"json",
      "facet.range.start":"NOW/HOUR-24HOUR",
      "facet.range.end":"NOW/HOUR+1HOUR"}},
  "response":{"numFound":5098,"start":0,"maxScore":0.12707794,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{},
    "facet_dates":{},
    "facet_ranges":{
      "publicat



## Examples fetching data based on named entities


#### Fetch Top 5 news with NER annotations for "Hillary Clinton" AND "Donald Trump"

In [30]:
exec_query(
        {
            'q': 'entityLabels: "Hillary Clinton" AND entityLabels: "Donald Trump"', 
            'fq': '+publicationDate:[NOW/HOUR-24HOUR TO NOW/HOUR+1HOUR] +sourceId:neofonie',
            'fl': 'neoUrl AND title AND entityLabels',
            'sort': 'publicationDate DESC',
            'rows': '5',
        })


https://username:password@nstr.neofonie.de/solr-dev/hackathon/select?rows=5&q=entityLabels%3A+%22Hillary+Clinton%22+AND+entityLabels%3A+%22Donald+Trump%22&sort=publicationDate+DESC&fl=neoUrl+AND+title+AND+entityLabels&fq=%2BpublicationDate%3A%5BNOW%2FHOUR-24HOUR+TO+NOW%2FHOUR%2B1HOUR%5D+%2BsourceId%3Aneofonie&wt=json&indent=on&
{
  "responseHeader":{
    "status":0,
    "QTime":155,
    "params":{
      "q":"entityLabels: \"Hillary Clinton\" AND entityLabels: \"Donald Trump\"",
      "indent":"on",
      "fl":"neoUrl AND title AND entityLabels",
      "sort":"publicationDate DESC",
      "fq":"+publicationDate:[NOW/HOUR-24HOUR TO NOW/HOUR+1HOUR] +sourceId:neofonie",
      "rows":"5",
      "wt":"json"}},
  "response":{"numFound":4017,"start":0,"docs":[
      {
        "entityLabels":["Amerikaner",
          "Politische Kampagne",
          "Weißes Haus",
          "Hillary Clinton",
          "Grundsteinlegung",
          "Pentagon",
          "Chris Christie",
          "Lobbyismus",


#### Fetch TOP 5 news for "Volkswagen"

In [31]:
exec_query(
        {
            'q': 'entityLabels: Volkswagen', 
            'fl': 'title',
            'fq': '+publicationDate:[NOW/HOUR-24HOUR TO NOW/HOUR+1HOUR] +sourceId:neofonie',
            'sort': 'publicationDate DESC',
            'rows': '5',
        })


https://username:password@nstr.neofonie.de/solr-dev/hackathon/select?rows=5&q=entityLabels%3A+Volkswagen&sort=publicationDate+DESC&fl=title&fq=%2BpublicationDate%3A%5BNOW%2FHOUR-24HOUR+TO+NOW%2FHOUR%2B1HOUR%5D+%2BsourceId%3Aneofonie&wt=json&indent=on&
{
  "responseHeader":{
    "status":0,
    "QTime":396,
    "params":{
      "q":"entityLabels: Volkswagen",
      "indent":"on",
      "fl":"title",
      "sort":"publicationDate DESC",
      "fq":"+publicationDate:[NOW/HOUR-24HOUR TO NOW/HOUR+1HOUR] +sourceId:neofonie",
      "rows":"5",
      "wt":"json"}},
  "response":{"numFound":249,"start":0,"docs":[
      {
        "title":"Le plus mauvais chiffre depuis 2011"},
      {
        "title":"BUND fordert Verkaufsstopp von Dieselautos"},
      {
        "title":" Das ist der<br />neue Golf 7,8"},
      {
        "title":"Betrugssoftware bei Audi – Klagewelle rollt an"},
      {
        "title":"Mehr Audi für den Golf 7 Version 2.0"}]
  }}


#### Fetch TOP 5 news for the last two hours with recognized Organisations

In [32]:
exec_query(
        {
            'q': 'entityTypes: ORGANISATION', 
            'fl': 'neoUrl title entityRfc4180',
            'fq': '+publicationDate:[NOW/HOUR-2HOUR TO NOW/HOUR+1HOUR] +sourceId:neofonie',
            'sort': 'publicationDate DESC',
            'rows': '5',
        })


https://username:password@nstr.neofonie.de/solr-dev/hackathon/select?rows=5&q=entityTypes%3A+ORGANISATION&sort=publicationDate+DESC&fl=neoUrl+title+entityRfc4180&fq=%2BpublicationDate%3A%5BNOW%2FHOUR-2HOUR+TO+NOW%2FHOUR%2B1HOUR%5D+%2BsourceId%3Aneofonie&wt=json&indent=on&
{
  "responseHeader":{
    "status":0,
    "QTime":120,
    "params":{
      "q":"entityTypes: ORGANISATION",
      "indent":"on",
      "fl":"neoUrl title entityRfc4180",
      "sort":"publicationDate DESC",
      "fq":"+publicationDate:[NOW/HOUR-2HOUR TO NOW/HOUR+1HOUR] +sourceId:neofonie",
      "rows":"5",
      "wt":"json"}},
  "response":{"numFound":4143,"start":0,"docs":[
      {
        "title":"Unfallursache: Winter missachtet",
        "neoUrl":"http://www.sz-online.de/nachrichten/unfallursache-winter-missachtet-3537386.html",
        "entityRfc4180":["k,98,108,CONCEPT,Jahreszeit,Q24384,Jahreszeit,34.646667\r\n",
          "k,122,131,CONCEPT,Landkreis,Q106658,Landkreis,34.248913\r\n",
          "k,787,794,PL

#### Fetch TOP 5 news for which CRF<sup>1</sup> recognized persons that are not already known as named entities.

<sup>1</sup> CRF = Conditional Random Field - a way of recognizing unkonwn entities .

In [33]:
exec_query(
        {
            'q': 'unknownTypes: PERSON', 
            'fl': 'neoUrl title entityRfc4180',
            'fq': '+publicationDate:[NOW/HOUR-2HOUR TO NOW/HOUR+1HOUR] +sourceId:neofonie',
            'sort': 'publicationDate DESC',
            'rows': '5',
        })


https://username:password@nstr.neofonie.de/solr-dev/hackathon/select?rows=5&q=unknownTypes%3A+PERSON&sort=publicationDate+DESC&fl=neoUrl+title+entityRfc4180&fq=%2BpublicationDate%3A%5BNOW%2FHOUR-2HOUR+TO+NOW%2FHOUR%2B1HOUR%5D+%2BsourceId%3Aneofonie&wt=json&indent=on&
{
  "responseHeader":{
    "status":0,
    "QTime":119,
    "params":{
      "q":"unknownTypes: PERSON",
      "indent":"on",
      "fl":"neoUrl title entityRfc4180",
      "sort":"publicationDate DESC",
      "fq":"+publicationDate:[NOW/HOUR-2HOUR TO NOW/HOUR+1HOUR] +sourceId:neofonie",
      "rows":"5",
      "wt":"json"}},
  "response":{"numFound":2897,"start":0,"docs":[
      {
        "title":"Kündigungswelle im Rathaus",
        "neoUrl":"http://www.sz-online.de/nachrichten/kuendigungswelle-im-rathaus-3537212.html",
        "entityRfc4180":["k,108,113,CONCEPT,Klima,Q7937,Klima,45.243084\r\n",
          "k,133,146,PLACE,Bischofswerda,Q81717,Bischofswerda,39.34405\r\n",
          "k,951,956,CONCEPT,Klima,Q7937,Klima,45



## Examples fetching data with facets



#### Number of documents from the different News-Stream sources


In [34]:
exec_query(
        {
            'q': '*', 
            'fq': '+publicationDate:[NOW/HOUR-30DAY TO NOW/HOUR+1HOUR] +sourceId:neofonie',
            'rows': '0',
            'facet': 'true',
            'facet.field': 'neoPublicationName',
            'facet.missing': 'true',
            'facet.sort': 'count',
            'facet.method': 'enum'
        })


https://username:password@nstr.neofonie.de/solr-dev/hackathon/select?facet=true&rows=0&q=%2A&facet.field=neoPublicationName&facet.sort=count&facet.method=enum&facet.missing=true&fq=%2BpublicationDate%3A%5BNOW%2FHOUR-30DAY+TO+NOW%2FHOUR%2B1HOUR%5D+%2BsourceId%3Aneofonie&wt=json&indent=on&
{
  "responseHeader":{
    "status":0,
    "QTime":283,
    "params":{
      "q":"*",
      "facet.field":"neoPublicationName",
      "indent":"on",
      "facet.method":"enum",
      "facet.missing":"true",
      "fq":"+publicationDate:[NOW/HOUR-30DAY TO NOW/HOUR+1HOUR] +sourceId:neofonie",
      "rows":"0",
      "facet":"true",
      "wt":"json",
      "facet.sort":"count"}},
  "response":{"numFound":307995,"start":0,"maxScore":1.0,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "neoPublicationName":[
        "AD HOC NEWS",26222,
        "FOCUS Online",7548,
        "Westdeutsche Allgemeine",7217,
        "Wallstreet Online",6213,
        "finanzen.net",4153,
   

#### Counts of news per hour containing the search term "Hillary Clinton" in the last 24 hours.

In [35]:

exec_query(
        {
            'q': 'Hillary Clinton', 
            'fq': '+publicationDate:[NOW/HOUR-24HOUR TO NOW/HOUR+1HOUR] +sourceId:neofonie',
            'fl': 'titles',
            'rows': '0',
            'facet': 'true',
            'facet.range': 'publicationDate',
            'facet.range.start': 'NOW/HOUR-24HOUR',
            'facet.range.end': 'NOW/HOUR+1HOUR',
            'facet.range.gap': '+1HOUR'
        })


https://username:password@nstr.neofonie.de/solr-dev/hackathon/select?facet=true&rows=0&q=Hillary+Clinton&facet.range.gap=%2B1HOUR&facet.range.end=NOW%2FHOUR%2B1HOUR&facet.range=publicationDate&facet.range.start=NOW%2FHOUR-24HOUR&fl=titles&fq=%2BpublicationDate%3A%5BNOW%2FHOUR-24HOUR+TO+NOW%2FHOUR%2B1HOUR%5D+%2BsourceId%3Aneofonie&wt=json&indent=on&
{
  "responseHeader":{
    "status":0,
    "QTime":46,
    "params":{
      "facet.range":"publicationDate",
      "q":"Hillary Clinton",
      "facet.range.gap":"+1HOUR",
      "indent":"on",
      "fl":"titles",
      "fq":"+publicationDate:[NOW/HOUR-24HOUR TO NOW/HOUR+1HOUR] +sourceId:neofonie",
      "rows":"0",
      "facet":"true",
      "wt":"json",
      "facet.range.start":"NOW/HOUR-24HOUR",
      "facet.range.end":"NOW/HOUR+1HOUR"}},
  "response":{"numFound":5247,"start":0,"maxScore":1.0801213,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{},
    "facet_dates":{},
    "facet_ranges":{
      "publicati

#### Count news grouped by language for the search term "Hillary Clinton" OR "Donald Trump".

In [36]:
exec_query(
        {
            'q': 'entityLabels:"Hillary Clinton" OR entityLabels:"Donald Trump"',
            'fq':'publicationDate:[NOW/DAY-3DAY TO NOW/DAY+1DAY]',
            'rows': '0',
            'facet': 'true',
            'facet.field': 'language',
            'facet.limit': '10',
            'facet.missing': 'true',
            'facet.sort': 'count',
            'facet.method': 'fcs'
        })


https://username:password@nstr.neofonie.de/solr-dev/hackathon/select?facet=true&rows=0&q=entityLabels%3A%22Hillary+Clinton%22+OR+entityLabels%3A%22Donald+Trump%22&facet.field=language&facet.missing=true&facet.sort=count&facet.method=fcs&facet.limit=10&fq=publicationDate%3A%5BNOW%2FDAY-3DAY+TO+NOW%2FDAY%2B1DAY%5D&wt=json&indent=on&
{
  "responseHeader":{
    "status":0,
    "QTime":217,
    "params":{
      "q":"entityLabels:\"Hillary Clinton\" OR entityLabels:\"Donald Trump\"",
      "facet.limit":"10",
      "facet.field":"language",
      "indent":"on",
      "facet.missing":"true",
      "facet.method":"fcs",
      "fq":"publicationDate:[NOW/DAY-3DAY TO NOW/DAY+1DAY]",
      "rows":"0",
      "facet":"true",
      "wt":"json",
      "facet.sort":"count"}},
  "response":{"numFound":28822,"start":0,"maxScore":4.876194,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "language":[
        "de",20700,
        "en",7679,
        "fr",309,
        "",134

#### Counting all occurrences of named entities in news which contain NEs "Hillary Clinton" OR "Donald Trump"

In [37]:
exec_query(
        {
            'q': 'entityLabels:"Hillary Clinton" OR entityLabels:"Donald Trump"',
            'fq':'publicationDate:[NOW/DAY-3DAY TO NOW/DAY+1DAY]',
            'rows': '0',
            'facet': 'true',
            'facet.field': 'knownSurfaceforms',
            'facet.limit': '10',
            'facet.missing': 'true',
            'facet.sort': 'count',
            'facet.method': 'enum'
         })


https://username:password@nstr.neofonie.de/solr-dev/hackathon/select?facet=true&rows=0&q=entityLabels%3A%22Hillary+Clinton%22+OR+entityLabels%3A%22Donald+Trump%22&facet.field=knownSurfaceforms&facet.missing=true&facet.sort=count&facet.method=enum&facet.limit=10&fq=publicationDate%3A%5BNOW%2FDAY-3DAY+TO+NOW%2FDAY%2B1DAY%5D&wt=json&indent=on&
{
  "responseHeader":{
    "status":0,
    "QTime":1323,
    "params":{
      "q":"entityLabels:\"Hillary Clinton\" OR entityLabels:\"Donald Trump\"",
      "facet.limit":"10",
      "facet.field":"knownSurfaceforms",
      "indent":"on",
      "facet.missing":"true",
      "facet.method":"enum",
      "fq":"publicationDate:[NOW/DAY-3DAY TO NOW/DAY+1DAY]",
      "rows":"0",
      "facet":"true",
      "wt":"json",
      "facet.sort":"count"}},
  "response":{"numFound":28824,"start":0,"maxScore":4.8762684,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "knownSurfaceforms":[
        "Donald Trump",26178,
        "H

#### Counting all CRFs in news which contain NEs "Hillary Clinton" OR "Donald Trump"

In [39]:
exec_query(
        {
            'q': 'entityLabels:"Hillary Clinton" OR entityLabels:"Donald Trump"',
            'fq':'publicationDate:[NOW/DAY-3DAY TO NOW/DAY+1DAY]',
            'rows': '0',
            'facet': 'true',
            'facet.field': 'unknownPersons',
            'facet.limit': '10',
            'facet.missing': 'true',
            'facet.sort': 'count',
            'facet.method': 'enum'
         })


https://username:password@nstr.neofonie.de/solr-dev/hackathon/select?facet=true&rows=0&q=entityLabels%3A%22Hillary+Clinton%22+OR+entityLabels%3A%22Donald+Trump%22&facet.field=unknownPersons&facet.missing=true&facet.sort=count&facet.method=enum&facet.limit=10&fq=publicationDate%3A%5BNOW%2FDAY-3DAY+TO+NOW%2FDAY%2B1DAY%5D&wt=json&indent=on&
{
  "responseHeader":{
    "status":0,
    "QTime":339,
    "params":{
      "q":"entityLabels:\"Hillary Clinton\" OR entityLabels:\"Donald Trump\"",
      "facet.limit":"10",
      "facet.field":"unknownPersons",
      "indent":"on",
      "facet.missing":"true",
      "facet.method":"enum",
      "fq":"publicationDate:[NOW/DAY-3DAY TO NOW/DAY+1DAY]",
      "rows":"0",
      "facet":"true",
      "wt":"json",
      "facet.sort":"count"}},
  "response":{"numFound":28824,"start":0,"maxScore":4.8762684,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "unknownPersons":[
        "White House",2398,
        "Donald Trump'


## Examples for selecting dpa data


#### Loading dpa-News from News-Stream

In [41]:
exec_query(
        {
            'q': 'entityLabels:"Hillary Clinton" OR entityLabels:"Donald Trump"',
            'fq': 'sourceId:dpa',
         })


https://username:password@nstr.neofonie.de/solr-dev/hackathon/select?q=entityLabels%3A%22Hillary+Clinton%22+OR+entityLabels%3A%22Donald+Trump%22&fq=sourceId%3Adpa&wt=json&rows=3&indent=on&
{
  "responseHeader":{
    "status":0,
    "QTime":112,
    "params":{
      "q":"entityLabels:\"Hillary Clinton\" OR entityLabels:\"Donald Trump\"",
      "indent":"on",
      "fq":"sourceId:dpa",
      "rows":"3",
      "wt":"json"}},
  "response":{"numFound":1625,"start":0,"maxScore":4.875881,"docs":[
      {
        "sourceId":"dpa",
        "entityLabels":["Vereinigtes Königreich",
          "Dpa-AFX Wirtschaftsnachrichten",
          "Financial Times",
          "100-Meter-Lauf",
          "Präsident der Vereinigten Staaten",
          "8. November",
          "Hillary Clinton",
          "Finanzmarkt",
          "Federal Reserve System",
          "Geldpolitik",
          "CMC Markets",
          "Zentralbank",
          "Börsenkurs",
          "Chief Financial Officer",
          "L’Oréal",
 

#### Loading dpa-News from News-Stream with dpa specific fields

In [42]:
exec_query(
        {
            'q': 'entityLabels:"Hillary Clinton" OR entityLabels:"Donald Trump"',
            'fq': 'sourceId:dpa',
            'fl': 'id dpaId publicationDate title mlRessort dpaIndustries',
            'sort': 'publicationDate DESC',
            'rows': '5'
        })


https://username:password@nstr.neofonie.de/solr-dev/hackathon/select?rows=5&q=entityLabels%3A%22Hillary+Clinton%22+OR+entityLabels%3A%22Donald+Trump%22&sort=publicationDate+DESC&fl=id+dpaId+publicationDate+title+mlRessort+dpaIndustries&fq=sourceId%3Adpa&wt=json&indent=on&
{
  "responseHeader":{
    "status":0,
    "QTime":105,
    "params":{
      "q":"entityLabels:\"Hillary Clinton\" OR entityLabels:\"Donald Trump\"",
      "indent":"on",
      "fl":"id dpaId publicationDate title mlRessort dpaIndustries",
      "sort":"publicationDate DESC",
      "fq":"sourceId:dpa",
      "rows":"5",
      "wt":"json"}},
  "response":{"numFound":1625,"start":0,"docs":[
      {
        "title":"HDE: Trump-Sieg bremst Kauflust der Deutschen vor Weihnachten nicht",
        "dpaIndustries":["CSM",
          "REF",
          "RET"],
        "mlRessort":"wi",
        "id":"8e1be35825f580886c308a2ac790b0c1",
        "publicationDate":"2016-11-10T12:01:15Z",
        "dpaId":"urn:newsml:dpa.com:20090101:161

#### Aggregation of dpa news on category 'mlIndustries'

```
 FIN -> Asset Management, Finanzdienstleister
 AUT -> Automobil-/Zuliefererindustrie (Autos &amp; LKW, Ersatzteile, Reifen) 
 BAN -> Banken 
 CON -> Bau 
 PER -> Bekleidung, Kosmetik 
 MIN -> Bergbau, Rohstoffförderung (Kohle, Diamanten, Gold, Platin, Edelmetalle) 
 EQI -> Beteiligungsgesellschaften
 EQN -> Börsennotierte Fonds (ETF, etc.) 
 CHM -> Chemie, Kunststoffe
 CMP -> Computer, Hardware, Software, Halbleiter, Bauteile 
 ELU -> Elektrizitätsversorger 
 ELE -> Elektronik, Elektrik, Komponenten 
 AEG -> Erneuerbare Energien 
 HTH -> Gesundheitswesen, Medizintechnik, Krankenhausbedarf 
 BEV -> Getränke (Bier, Wein, Destillerien, Soft Drinks) 
 TRN -> Gütertransport, Logistik 
 HOU -> Haushaltswaren, Möbel, Eigenheime 
 PRO -> Immobilien 
 REF -> Lebensmittel- und Pharmahandel 
 ASS -> Lebensversicherer 
 ENG -> Maschinenbau, Starkstrom, Umwelttechnik 
 MET -> Metallverarbeitung- und förderung, NE-Metalle 
 INL -> Mischkonzerne, Verpackungsindustrie 
 FOO -> Nahrungsmittel (Hersteller, inkl. Agrarindustrie) 
 RET -> Non-Food-Einzelhandel, Endkunden-Dienstleister 
 PAP -> Papier, Zellulose, Holz 
 PHA -> Pharma, Biotechnologie 
 DEF -> Rüstungsindustrie, Flugzeughersteller 
 INS -> Sach- und Rückversicherungen 
 SOF -> Software, IT-Beratung, Internet, Portalbetreiber 
 TOB -> Tabakindustrie 
 TEL -> Telefongesellschaften (Festnetz) 
 MOB -> Telefongesellschaften (Mobilfunk) 
 LEI -> Tourismus, Fluggesellschaften, Bahn (Personenverkehr) 
 SVS -> Unternehmensdienstleister 
 CSM -> Verbrauchsgüter, Kosmetik, Seife, Handwerksbedarf, Möbel, Haushaltsgeräte, Unterhaltungselektronik 
 MED -> Verlage, Rundfunk, Info-Dienste, Zeitungen, Bücher, Werbung 
 UTI -> Versorger (Gas, Wasser etc.) 
 OIL -> Öl, Ölexploration, Gas 
 OES -> Öl-Anlagenbau, Pipelines 
```



In [44]:
exec_query(
        {
            'q': 'Siemens',
            'fq': 'sourceId:dpa',
            'rows': '0',
            'facet': 'true',
            'facet.field': 'mlIndustries',
            'facet.limit': '10',
            'facet.missing': 'true',
            'facet.sort': 'count',
            'facet.method': 'enum'
         })


https://username:password@nstr.neofonie.de/solr-dev/hackathon/select?facet=true&rows=0&q=Siemens&facet.field=mlIndustries&facet.missing=true&facet.sort=count&facet.method=enum&facet.limit=10&fq=sourceId%3Adpa&wt=json&indent=on&
{
  "responseHeader":{
    "status":0,
    "QTime":157,
    "params":{
      "q":"Siemens",
      "facet.limit":"10",
      "facet.field":"mlIndustries",
      "indent":"on",
      "facet.missing":"true",
      "facet.method":"enum",
      "fq":"sourceId:dpa",
      "rows":"0",
      "facet":"true",
      "wt":"json",
      "facet.sort":"count"}},
  "response":{"numFound":132,"start":0,"maxScore":1.1718588,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "mlIndustries":[
        "ENG",35,
        "INL",25,
        "ELE",9,
        "CON",3,
        "ELU",3,
        "OIL",3,
        "UTI",3,
        "MET",2,
        "TRN",2,
        "AUT",1,
        null,95]},
    "facet_dates":{},
    "facet_ranges":{},
    "facet_intervals":{}

#### Aggregation of dpa news on category 'dpaRessort'

pl="politik", wi="wirtschaft", rs="redaktioneller service", vm="vermischtes", ku="kultur", sp="sport"

In [45]:
exec_query(
        {
            'q': 'entityLabels:"Hillary Clinton" OR entityLabels:"Donald Trump"',
            'fq': 'sourceId:dpa',
            'rows': '0',
            'facet': 'true',
            'facet.field': 'dpaRessort',
            'facet.missing': 'true',
            'facet.sort': 'count',
            'facet.method': 'enum'
         })


https://username:password@nstr.neofonie.de/solr-dev/hackathon/select?facet=true&rows=0&q=entityLabels%3A%22Hillary+Clinton%22+OR+entityLabels%3A%22Donald+Trump%22&facet.field=dpaRessort&facet.sort=count&facet.method=enum&facet.missing=true&fq=sourceId%3Adpa&wt=json&indent=on&
{
  "responseHeader":{
    "status":0,
    "QTime":152,
    "params":{
      "q":"entityLabels:\"Hillary Clinton\" OR entityLabels:\"Donald Trump\"",
      "facet.field":"dpaRessort",
      "indent":"on",
      "facet.method":"enum",
      "facet.missing":"true",
      "fq":"sourceId:dpa",
      "rows":"0",
      "facet":"true",
      "wt":"json",
      "facet.sort":"count"}},
  "response":{"numFound":1625,"start":0,"maxScore":4.875924,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "dpaRessort":[
        "pl",1049,
        "wi",356,
        "rs",167,
        "vm",36,
        "ku",11,
        "sp",6,
        null,0]},
    "facet_dates":{},
    "facet_ranges":{},
    "facet_inte

#### Aggregation of dpa news on category 'dpaServices'



Acronyms for dpa Services:

* dpasrv:bdt -> Basisdienst
* afxsrv:ADE -> AFX Kompakt
* edi-bid -> Teil des Basisdienstes
* dpasrv:hfk -> Hörfunkdienst/ Kurznachrichtendienst und Teilmenge des Basisdienstes.
* wap- Präfix sind Varianten des jeweiligen Landesdienstes.

For regional dpa services there exist the following acronyms:

* bwg: Baden-Württemberg
* brb: Berlin / Brandenburg
* rhs: Rheinland-Pfalz / Saarland
* bay: Bayern
* hsh: Hamburg / Schleswig-Holstein
* nwf: Nordrhein-Westfalen
* san: Sachsen
* aht: Sachsen-Anhalt
* hes: Hessen
* mbv: Mecklenburg-Vorpommern
* thg: Thüringen
* nsb: Niedersachsen / Bremen


In [47]:
exec_query(
        {
            'q': 'entityLabels:"Hillary Clinton" OR entityLabels:"Donald Trump"',
            'fq': 'sourceId:dpa',
            'rows': '0',
            'facet': 'true',
            'facet.field': 'dpaServices',
            'facet.missing': 'true',
            'facet.sort': 'count',
            'facet.method': 'enum'
         })


https://username:password@nstr.neofonie.de/solr-dev/hackathon/select?facet=true&rows=0&q=entityLabels%3A%22Hillary+Clinton%22+OR+entityLabels%3A%22Donald+Trump%22&facet.field=dpaServices&facet.sort=count&facet.method=enum&facet.missing=true&fq=sourceId%3Adpa&wt=json&indent=on&
{
  "responseHeader":{
    "status":0,
    "QTime":135,
    "params":{
      "q":"entityLabels:\"Hillary Clinton\" OR entityLabels:\"Donald Trump\"",
      "facet.field":"dpaServices",
      "indent":"on",
      "facet.method":"enum",
      "facet.missing":"true",
      "fq":"sourceId:dpa",
      "rows":"0",
      "facet":"true",
      "wt":"json",
      "facet.sort":"count"}},
  "response":{"numFound":1625,"start":0,"maxScore":4.8760953,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "dpaServices":[
        "dpasrv:bdt",875,
        "afxsrv:ADE",617,
        "dpasrv:edi",594,
        "dpasrv:edt",594,
        "dpasrv:erd",594,
        "dpasrv:bid",466,
        "dpasrv:hfk",81

#### Aggregation of dpa news on category 'dpaKeywords'

In [48]:
exec_query(
        {
            'q': 'entityLabels:"Hillary Clinton" OR entityLabels:"Donald Trump"',
            'fq': 'sourceId:dpa',
            'rows': '0',
            'facet': 'true',
            'facet.field': 'dpaKeywords',
            'facet.limit': '10',
            'facet.missing': 'true',
            'facet.sort': 'count',
            'facet.method': 'enum'
         })


https://username:password@nstr.neofonie.de/solr-dev/hackathon/select?facet=true&rows=0&q=entityLabels%3A%22Hillary+Clinton%22+OR+entityLabels%3A%22Donald+Trump%22&facet.field=dpaKeywords&facet.missing=true&facet.sort=count&facet.method=enum&facet.limit=10&fq=sourceId%3Adpa&wt=json&indent=on&
{
  "responseHeader":{
    "status":0,
    "QTime":170,
    "params":{
      "q":"entityLabels:\"Hillary Clinton\" OR entityLabels:\"Donald Trump\"",
      "facet.limit":"10",
      "facet.field":"dpaKeywords",
      "indent":"on",
      "facet.missing":"true",
      "facet.method":"enum",
      "fq":"sourceId:dpa",
      "rows":"0",
      "facet":"true",
      "wt":"json",
      "facet.sort":"count"}},
  "response":{"numFound":1625,"start":0,"maxScore":4.875881,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "dpaKeywords":[
        "Präsident",515,
        "dpa",212,
        "Reaktionen",157,
        "Tagesvorschau",111,
        "Wahlen",82,
        "Audio",81,