Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the Italian (IT) public bodies from the Public Administration Index #16

Open
groundrace opened this issue May 2, 2013 · 9 comments
Labels
Data Data sources and ingestion automation

Comments

@groundrace
Copy link

add the list of the italian public administrations using the data available on the CSV maintained by the Italian Public Administration Index

@traversaro
Copy link
Contributor

I have parsed the indicePA (Public Administration Index) csv in the publicbodies format: https://github.com/pegua/misc/blob/master/publicbodies/it.csv (using a python script: https://github.com/pegua/misc/blob/master/publicbodies/importIT.py ) .
The data is comprehensive of all the italian schools, as they are considered public bodies, for a total of ~20000 public bodies (and some of the italian public bodies are missing from indicePA!).
Some remaining issues:

@rufuspollock
Copy link
Member

@PeGua super useful.

  • The file is pretty big ;-) (~6Mb). How many items are public schools?
  • Re license: that's a good point. Perhaps we should have per file licensing (or switch to ODbL or ...)
  • contact field and address field - this only exists in DE at the moment. I'm actually not sure and am asking @stefanw about the difference

@rufuspollock
Copy link
Member

@PeGua in fact, from discussion on the list I think we want to leave out e.g. schools and focus on the primary "government" bodies (though we could add a note about the bigger list).

Before we redo the csv and pull request could you give a brief summary of the types of "public body" in the list (and rough stats e.g. X no of schools, Y no of gov departments etc).

@traversaro
Copy link
Contributor

Fortunately the indicePA data has a "category" field, so it easy to remove school while converting the file to publicbodies.org .
A summary of the public body is available in italian here: http://www.indicepa.gov.it/report/rep-amministraz-percategoria.php
Considering only the most used categories, there are:

  • 9.369 schools (primary and secondary)
  • 7.770 municipalities
  • 831 Professional associations (in italy they are public bodies)

On a total of ~20000 public bodies.
So the schools are half of the the all public bodies.
I've updated the data for leaving out the schools: https://github.com/pegua/misc/tree/master/publicbodies
If size is still an issue, it is also possible to leave out the complete url of the source.

@rufuspollock
Copy link
Member

This looks great. One thing I note is a lot of the url field entries do not start with http:// but just with www.

I also wondered about central government departments? Are they included here (and does such a list exist?)

@traversaro
Copy link
Contributor

I have committed a quick fix for the "http://" issue.

Yes, all central government departments (together with all central public bodies) are included (for many of them, on indicePA is available also data on the internal organization of the departments, also in linked data format but I guess that data is not of interest for this project).

@rufuspollock
Copy link
Member

@traversaro could you submit a pull request? Could you also check the data is in line with our new spec for the CSV - see https://github.com/okfn/publicbodies#contribute-data

@traversaro
Copy link
Contributor

See #58

I have also saw that you are interesting in hierarchical information, the italian open data on public bodies internal offices and structure has many details, but in this pull request I have only included the public bodies that have a "distinct corporate existence", let me know if you are interested in more details.

@traversaro
Copy link
Contributor

One thing I forgot: this same data is exposed as Linked Open Data ( http://spcdata.digitpa.gov.it/dataIPA.html ), do you think there is a place suitable for the "sameAs" information between the "official" URIs and the one used in publicbodies.org ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data Data sources and ingestion automation
Projects
None yet
Development

No branches or pull requests

3 participants