Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collection Schema doesn't exist #34

Closed
abrin opened this issue Apr 6, 2021 · 15 comments
Closed

Collection Schema doesn't exist #34

abrin opened this issue Apr 6, 2021 · 15 comments

Comments

@abrin
Copy link

abrin commented Apr 6, 2021

Describe the bug

Hi, I was able to find the definitions for a collection schema in the [csv] (https://raw.githubusercontent.com/openschemas/schemaorg/master/schemaorg/data/releases/7.03/all-layers-types.csv), and JSON-LD files, but when you try and use it, it's reported that it doesn't exist (here's the output of the test I created (below). Not sure I'm missing something though?:

Specification base set to http://www.schema.org
Using Version 7.03
Collection
Did you mean:
CollectionPage
---------------------------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------------------------
WARNING /Users/abrin/schemaorg/recipe.yml does not exist.
WARNING /Users/abrin/schemaorg/recipe.yml does not exist.
ERROR Collection is not a valid type!

To Reproduce
Steps to reproduce the behavior:

I created a basic test:

from schemaorg.templates.google import make_person
from schemaorg.main.parse import RecipeParser
from schemaorg.main import Schema
import shutil
import os
import unittest
import tempfile

print("######################################################## test_schema")

class TestSchema(unittest.TestCase):

    def setUp(self):
        self.tmpdir = os.path.join(tempfile.gettempdir(), 'schemaorg-test')

        if not os.path.exists(self.tmpdir):
            os.mkdir(self.tmpdir)


    def tearDown(self):
        pass


    def test_collection(self):

        self.here = os.path.abspath(os.path.dirname(__file__))
        recipe_yml = os.path.join(self.here, "recipe.yml")
        self.recipe = RecipeParser(recipe_yml)
        self.collection = Schema("Collection")
        self.collection.add_property('name', 'test')
        self.collection.validate(self.collection)



if __name__ == '__main__':
    unittest.main()

recipe.yml

version: 1
schemas:
  Dataset:
    recommended:
      - softwareVersion: version
      - citation
      - identifier
      - keywords
      - license
      - url
      - sameAs
      - spatialCoverage
      - temporalCoverage
      - variableMeasured
      - includedInDataCatalog
    required:
      - description
      - name
  Person|Organization:
    required:
      - description
      - name
  Collection:
    required:
      - name


thanks
Version of Python schemaorg
latest

@vsoch
Copy link
Member

vsoch commented Apr 6, 2021

Thank you for the detailed test script - it made it easy to debug!

So the error isn't wrong - the file we get schema types from is called schema-types.csv, and that does not have a Collection. If I look at the difference between the two, the file you reference has many more:

set(layers).difference(set(schema))
Out[15]: 
{'3DModel',
 'Abdomen',
 'ActiveNotRecruiting',
 'AdvertiserContentArticle',
 'AerobicActivity',
 'AnaerobicActivity',
 'AnalysisNewsArticle',
 'AnatomicalStructure',
 'AnatomicalSystem',
 'Anesthesia',
 'Appearance',
 'ApprovedIndication',
 'ArchiveComponent',
 'ArchiveOrganization',
 'Artery',
 'AskPublicNewsArticle',
 'Atlas',
 'Audiobook',
 'AuthenticContent',
 'AuthoritativeLegalValue',
 'Ayurvedic',
 'BackgroundNewsArticle',
 'Bacteria',
 'Balance',
 'BenefitsHealthAspect',
 'BloodTest',
 'Bone',
 'BrainStructure',
 'BrokerageAccount',
 'BusOrCoach',
 'CDCPMDRecord',
 'CT',
 'CarUsageType',
 'Cardiovascular',
 'CardiovascularExam',
 'CaseSeries',
 'CategoryCode',
 'CategoryCodeSet',
 'CausesHealthAspect',
 'Chapter',
 'Chiropractic',
 'Claim',
 'Class',
 'Clinician',
 'CohortStudy',
 'Collection',
 'ComicCoverArt',
 'ComicIssue',
 'ComicSeries',
 'ComicStory',
 'CommunityHealth',
 'CompleteDataFeed',
 'Completed',
 'Consortium',
 'ContagiousnessHealthAspect',
 'CorrectionComment',
 'CoverArt',
 'CovidTestingFacility',
 'CriticReview',
 'CrossSectional',
 'CssSelectorType',
 'DDxElement',
 'DefinedTerm',
 'DefinedTermSet',
 'DefinitiveLegalValue',
 'Dentistry',
 'Dermatologic',
 'Dermatology',
 'Diagnostic',
 'DiagnosticLab',
 'DiagnosticProcedure',
 'Diet',
 'DietNutrition',
 'DietarySupplement',
 'DoseSchedule',
 'DoubleBlindedTrial',
 'Drawing',
 'DrivingSchoolVehicleUsage',
 'Drug',
 'DrugClass',
 'DrugCost',
 'DrugCostCategory',
 'DrugLegalStatus',
 'DrugPregnancyCategory',
 'DrugPrescriptionStatus',
 'DrugStrength',
 'Ear',
 'EducationalOccupationalCredential',
 'EducationalOccupationalProgram',
 'Emergency',
 'EmployerReview',
 'Endocrine',
 'EnrollingByInvitation',
 'EventAttendanceModeEnumeration',
 'EventSeries',
 'EvidenceLevelA',
 'EvidenceLevelB',
 'EvidenceLevelC',
 'ExchangeRateSpecification',
 'ExchangeRefund',
 'ExercisePlan',
 'Eye',
 'FDAcategoryA',
 'FDAcategoryB',
 'FDAcategoryC',
 'FDAcategoryD',
 'FDAcategoryX',
 'FDAnotEvaluated',
 'Flexibility',
 'FloorPlan',
 'FullRefund',
 'FundingAgency',
 'FundingScheme',
 'Fungus',
 'Gastroenterologic',
 'Genetic',
 'Genitourinary',
 'GeospatialGeometry',
 'Geriatric',
 'Grant',
 'GraphicNovel',
 'Guide',
 'Gynecologic',
 'Head',
 'HealthAspectEnumeration',
 'HealthInsurancePlan',
 'HealthPlanCostSharingSpecification',
 'HealthPlanFormulary',
 'HealthPlanNetwork',
 'HealthTopicContent',
 'Hematologic',
 'Homeopathic',
 'HowOrWhereHealthAspect',
 'ImagingTest',
 'InForce',
 'Infectious',
 'InfectiousAgentClass',
 'InfectiousDisease',
 'InternationalTrial',
 'InvestmentFund',
 'Joint',
 'LaboratoryScience',
 'LegalForceStatus',
 'LegalValueLevel',
 'Legislation',
 'LegislationObject',
 'LeisureTimeActivity',
 'LibrarySystem',
 'LifestyleModification',
 'Ligament',
 'LinkRole',
 'LivingWithHealthAspect',
 'Longitudinal',
 'Lung',
 'LymphaticVessel',
 'MRI',
 'Manuscript',
 'MaximumDoseSchedule',
 'MayTreatHealthAspect',
 'MediaManipulationRatingEnumeration',
 'MediaReview',
 'MedicalAudience',
 'MedicalBusiness',
 'MedicalCause',
 'MedicalClinic',
 'MedicalCode',
 'MedicalCondition',
 'MedicalConditionStage',
 'MedicalContraindication',
 'MedicalDevice',
 'MedicalDevicePurpose',
 'MedicalEntity',
 'MedicalEnumeration',
 'MedicalEvidenceLevel',
 'MedicalGuideline',
 'MedicalGuidelineContraindication',
 'MedicalGuidelineRecommendation',
 'MedicalImagingTechnique',
 'MedicalIndication',
 'MedicalIntangible',
 'MedicalObservationalStudy',
 'MedicalObservationalStudyDesign',
 'MedicalProcedure',
 'MedicalProcedureType',
 'MedicalResearcher',
 'MedicalRiskCalculator',
 'MedicalRiskEstimator',
 'MedicalRiskFactor',
 'MedicalRiskScore',
 'MedicalScholarlyArticle',
 'MedicalSign',
 'MedicalSignOrSymptom',
 'MedicalSpecialty',
 'MedicalStudy',
 'MedicalStudyStatus',
 'MedicalSymptom',
 'MedicalTest',
 'MedicalTestPanel',
 'MedicalTherapy',
 'MedicalTrial',
 'MedicalTrialDesign',
 'MedicalWebPage',
 'MedicineSystem',
 'MerchantReturnEnumeration',
 'MerchantReturnFiniteReturnWindow',
 'MerchantReturnNotPermitted',
 'MerchantReturnPolicy',
 'MerchantReturnUnlimitedWindow',
 'MerchantReturnUnspecified',
 'Midwifery',
 'MisconceptionsHealthAspect',
 'MissingContext',
 'MixedEventAttendanceMode',
 'MonetaryGrant',
 'MoneyTransfer',
 'MortgageLoan',
 'Motorcycle',
 'MotorizedBicycle',
 'MultiCenterTrial',
 'MulticellularParasite',
 'Muscle',
 'Musculoskeletal',
 'MusculoskeletalExam',
 'Neck',
 'Nerve',
 'Neuro',
 'Neurologic',
 'NewsMediaOrganization',
 'Newspaper',
 'NoninvasiveProcedure',
 'Nose',
 'NotInForce',
 'NotYetRecruiting',
 'Nursing',
 'OTC',
 'Observation',
 'Observational',
 'Obstetric',
 'OccupationalActivity',
 'OccupationalTherapy',
 'OfferForLease',
 'OfferForPurchase',
 'OfferShippingDetails',
 'OfficialLegalValue',
 'OfflineEventAttendanceMode',
 'Oncologic',
 'OnlineEventAttendanceMode',
 'OpenTrial',
 'OpinionNewsArticle',
 'Optician',
 'Optometric',
 'OriginalShippingFees',
 'Osteopathic',
 'Otolaryngologic',
 'OverviewHealthAspect',
 'PET',
 'PalliativeProcedure',
 'PartiallyInForce',
 'Pathology',
 'PathologyTest',
 'Patient',
 'PatientExperienceHealthAspect',
 'Pediatric',
 'PercutaneousProcedure',
 'PharmacySpecialty',
 'PhysicalActivity',
 'PhysicalActivityCategory',
 'PhysicalExam',
 'PhysicalTherapy',
 'Physiotherapy',
 'PlaceboControlledTrial',
 'PlasticSurgery',
 'Play',
 'PodcastEpisode',
 'PodcastSeason',
 'PodcastSeries',
 'Podiatric',
 'Poster',
 'PrescriptionOnly',
 'PreventionHealthAspect',
 'PreventionIndication',
 'PrimaryCare',
 'Prion',
 'ProductReturnEnumeration',
 'ProductReturnFiniteReturnWindow',
 'ProductReturnNotPermitted',
 'ProductReturnPolicy',
 'ProductReturnUnlimitedWindow',
 'ProductReturnUnspecified',
 'PrognosisHealthAspect',
 'Project',
 'PronounceableText',
 'Property',
 'Protozoa',
 'Psychiatric',
 'PsychologicalTreatment',
 'PublicHealth',
 'PublicToilet',
 'Pulmonary',
 'Quotation',
 'RadiationTherapy',
 'RadioBroadcastService',
 'Radiography',
 'RandomizedTrial',
 'RealEstateListing',
 'Recommendation',
 'RecommendedDoseSchedule',
 'Recruiting',
 'RefundTypeEnumeration',
 'Registry',
 'ReimbursementCap',
 'RelatedTopicsHealthAspect',
 'Renal',
 'RentalVehicleUsage',
 'RepaymentSpecification',
 'ReportageNewsArticle',
 'ReportedDoseSchedule',
 'ResearchProject',
 'RespiratoryTherapy',
 'RestockingFees',
 'ResultsAvailable',
 'ResultsNotAvailable',
 'Retail',
 'ReturnFeesEnumeration',
 'ReturnShippingFees',
 'ReviewNewsArticle',
 'Rheumatologic',
 'RisksOrComplicationsHealthAspect',
 'SatiricalArticle',
 'Schedule',
 'SchoolDistrict',
 'ScreeningHealthAspect',
 'SeeDoctorHealthAspect',
 'SelfCareHealthAspect',
 'SheetMusic',
 'ShortStory',
 'SideEffectsHealthAspect',
 'SingleBlindedTrial',
 'SingleCenterTrial',
 'Skin',
 'SpecialAnnouncement',
 'SpeechPathology',
 'StagesHealthAspect',
 'StatisticalPopulation',
 'StoreCreditRefund',
 'StrengthTraining',
 'StupidType',
 'Substance',
 'SuperficialAnatomy',
 'Surgical',
 'SurgicalProcedure',
 'Suspended',
 'SymptomsHealthAspect',
 'TaxiVehicleUsage',
 'Terminated',
 'Therapeutic',
 'TherapeuticProcedure',
 'Thesis',
 'Throat',
 'TouristDestination',
 'TouristTrip',
 'Toxicologic',
 'TraditionalChinese',
 'TreatmentIndication',
 'TreatmentsHealthAspect',
 'TripleBlindedTrial',
 'TypesHealthAspect',
 'Ultrasound',
 'UnofficialLegalValue',
 'Urologic',
 'UsageOrScheduleHealthAspect',
 'UserReview',
 'Vein',
 'Vessel',
 'VeterinaryCare',
 'VirtualLocation',
 'Virus',
 'VitalSign',
 'WebAPI',
 'WebContent',
 'WesternConventional',
 'Wholesale',
 'Withdrawn',
 'WorkBasedProgram',
 'XPathType',
 'XRay'}

including Collection :) The structure of these files seems the same, but I'm not sure what distinguishes the two files. Could you ping folks at schema.org and figure out what the difference is between these two files? If it doesn't hurt to change to use the layers file (since it includes all of the other) then we can update to that. If it's not the right thing to do, then I can make a variable to let you choose the file. I'd also be curious about the other ext files that have types - are they included in the layers file? If we can find the "one file to rule them all" in terms of types, or at least an understanding of the difference, I'd be happy to make a PR to update the package so it works for your use case. Thank you!

@abrin
Copy link
Author

abrin commented Apr 6, 2021

Good Morning,
I took a closer look at the schema.org site and their releases and noticed that this repo is stopping at 7.03, but they're up to version 12.0. It also looks like they may have deprecated the all-types file (see) because the schema-types file only has the core elements, if I'm reading the comment properly. is it possible to update to use the newer structure?

Thanks,

Adam

@vsoch
Copy link
Member

vsoch commented Apr 6, 2021

Yes definitely possible. I can possibly get to this on a weekend, and please feel free to do a PR with the new files first if you have the time!

@vsoch
Copy link
Member

vsoch commented Apr 6, 2021

hey! So I think I can make some time at the end of the work day to fix this up for you - I'll prepare a PR with the new version for you to test.

@vsoch
Copy link
Member

vsoch commented Apr 6, 2021

Give this a test out! #35

I'm switching from CircleCI to GitHub actions, so ignore the CI results for now (I disconnected it).

@abrin
Copy link
Author

abrin commented Apr 6, 2021

Thanks!

Just tried and I'm getting:
ERROR /workspace/schemaorg/schemaorg/data/releases/12.0/schema-types.csv does not exist.

followed the install instructions from:

git clone https://www.github.com/openschemas/schemaorg
cd schemaorg
python setup.py install

Should it be reading schemaorg-current-http-types.csv and schemaorg-current-http-properties.csv instead of schema-properties.csv and schema-types.csv?

I tried making that change, but now I get:

Specification base set to http://www.schema.org
Using Version 12.0
Found http://www.schema.org/Collection
Collection: found 121 properties
Traceback (most recent call last):
  File "test.py", line 13, in <module>
    collection.validate(collection)
AttributeError: 'Schema' object has no attribute 'validate'

@vsoch
Copy link
Member

vsoch commented Apr 6, 2021

Make sure that you clone the branch in the linked PR (in your example you clone the main repository, which isn't different). And then self.collection.validate I don't believe was a supported function. The validation is done by the recipe parser, e.g.,

recipe = RecipeParser("recipe.yml")

Otherwise, the output looks good - I see that the Collection is found for version 12.0 with 121 properties!

Should it be reading schemaorg-current-http-types.csv and schemaorg-current-http-properties.csv instead of schema-properties.csv and schema-types.csv?

if you take a look at the changed code, you'll see we are using those files, just the https versions.

@abrin
Copy link
Author

abrin commented Apr 6, 2021

confirming I'm on the right branch. I'm wondering if schemaorg/data/__init__.py needs to be changed? This is what got me from the previous error to the current error :

abrin@GT29036 schemaorg % git branch    
* add/release-12.0
  master
abrin@GT29036 schemaorg % python test.py
Specification base set to http://www.schema.org
Using Version 12.0
Found http://www.schema.org/Collection
Collection: found 121 properties
Traceback (most recent call last):
  File "test.py", line 13, in <module>
    collection.validate(collection)
AttributeError: 'Schema' object has no attribute 'validate'

Just not sure how to address that. Attached my two test files, which I've pulled out of the test (can't seem to get pytest to run them without a module error, and this seemed faster).

test.zip

thanks

@vsoch
Copy link
Member

vsoch commented Apr 7, 2021

Sorry I'm not sure if you are hearing me - the Schema object does not have a validate funciton.

@vsoch
Copy link
Member

vsoch commented Apr 7, 2021

In your example, you would need to do:

collection = Schema('Collection')
recipe = RecipeParser("recipe.yml")
recipe.validate(collection)

Does that make sense?

@abrin
Copy link
Author

abrin commented Apr 7, 2021

Yes, that makes sense .Thanks for your patience with me, and apologies for the confusion and miss-understanding I caused. I tested with the correct validation code that I miss-entered. It works, but only with a change to these two lines:

reference to schema-properties in "schemaorg/schemaorg/data/init.py:90"
and
reference to schema-types.csv in "schemaorg/schemaorg/data/init.py:104"

@vsoch
Copy link
Member

vsoch commented Apr 7, 2021

@abrin that was my mistake! I had the changes locally and for some reason they didn't push. Please take another look!

@abrin
Copy link
Author

abrin commented Apr 7, 2021

That did it. My test works. thank you.

@vsoch
Copy link
Member

vsoch commented Apr 7, 2021

Great! Apologies for my oversight of not pushing the commits. I was testing different gpg keys yesterday and I think one of my commits just didn't take (and the other to add the new data was so large I didn't notice).

I'll get the PR merged and released asap.

@vsoch
Copy link
Member

vsoch commented Apr 7, 2021

Fixed with #35.

@vsoch vsoch closed this as completed Apr 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants