Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Enabling 'Club Finder' capability #313

Open
howaskew opened this issue Oct 19, 2023 · 6 comments
Open

Proposal: Enabling 'Club Finder' capability #313

howaskew opened this issue Oct 19, 2023 · 6 comments

Comments

@howaskew
Copy link

howaskew commented Oct 19, 2023

Proposers

Howard Askew (@howaskew), Darren Temple, Tim Corby and Andrew Newman (the OpenActive team at the Open Data Institute), in relation to conversations with London Sport, Yorkshire Sport Foundation and bodies such as the Football Association (FA) and the National Small-bore Rifle Association (NSRA).

Use Case

As an activity provider, facility provider or sporting body, I would like to share my data at the location or ‘club’ level. This would enable club finder style capabilities in 3rd party applications - allowing people to discover activity providers in a specific location.

My organisation may not yet be ready to publish event or session level data or to take open bookings, but would like to take first steps toward open data.

On an activity finder, a potential participant would see a location with activity type, website links and contact details. For example a football club might list an adult men's team, junior teams at different age brackets and women's and girl's teams. Participants would then be able to get in touch to find out more and to make a booking.

Additional Use Cases

As a local authority or active partnership, I would like to better map the provision of sport and physical activity in my jurisdiction in order to:

  • inform strategic decision making about the allocation of funding, resources and interventions
  • more easily support national campaigns and initiatives at a local or regional level
  • make comparisons with health, economic and demographic data to generate insights

As a social prescriber or link worker, I would like to understand the local provision of clubs accredited to national governing bodies in order to be better assured that the opportunities I refer my clients to have accountability related to safeguarding, risk management, coaching qualifications etc.

Why is this not covered by existing properties?

The existing properties could cover this proposed use case. For example, the FacilityUse and/or IndividualFacilityUse data model types could represent Football Pitches at a club site. However, the current guidance for officially accepted feeds currently suggests requirement of additional timed event level data i.e. the Slot data model type.

Please provide a link to example data

Examples 4 and 5 in the specification link organisations, locations and activities.

The data visualiser 'map' tab shows examples of existing data which can be presented in a ‘club finder’ style.

Screenshot 2023-10-19 at 11 51 03

Related initiatives

London Sport are currently (October 2023) running a project to collect community club data across London. This includes:

  • surveying organisations to capture the minimum / required fields to create an ‘open club standard’. They are exploring how this can align with existing standards including OpenActive.
  • Developing tools for central / bulk capture of club data, that may be useful beyond the initial focus area of London.

There are many opportunities for collaboration here to maximise impacts of this work and the OA team’s work as we have also been exploring options to upload in bulk from spreadsheet / csv to an RPDE style feed.

Proposals

Once club data is collected, how can it be disseminated to maximise impact? If it can be shared as open data alongside existing OpenActive feeds, it can be available for display in 3rd party tools like activity finders and directories.

Here are some initial options to achieve this, we welcome comments and alternative approaches:

Option 1
A new feed type for clubs, requiring activity type, location, organisation and contact details, but not requiring event level data.

Not recommended, as:
We do not wish to add another level of complexity to the feed specifications and documentation, and
By utilising an existing feed type it could simplify the process of adding event level data at a later date.

Option 2
Use the existing SessionSeries or FacilityUse feed types to provide information on the activities or facilities at a location or provided by an organisation. This will require revised guidance on handling Series and Facilities without specific Sessions or Slots subevent data.

Clarifying the documentation around feeds is already on the ‘to-do’ list, so this may seem like a promising approach, however we’d need to explore the impacts on data consumers.

Option 3
Use the existing SessionSeries or FacilityUse feed types with a dummy subevent for each activity or facility. The event will have an end date in the future.

Again this will require updated guidance, however it may prove easier to implement/manage from a data consumer perspective.

Option 4
Use the existing SessionSeries or FacilityUse feed types, without subevents, but with the addition of a new boolean field like "isClub".

Option 5
Instead of RPDE style feeds, use a potentially simpler format for dissemination for slowly changing data - something like CSV on the web. This embeds metadata into a csv file to provide information on how the data should be interpreted.

Example

The NSRA currently lists some information about their clubs in tabular format. (The map works intermittently, screenshot below). They would like some of this information to be visible alongside OpenActive opportunity data in 3rd party activity finders.
Screenshot 2023-10-19 at 11 53 09

Broader Issues

A. In a situation where individual clubs can upload their information to a central portal, and NGBs are starting to provide bulk uploads of information for affiliated clubs, how do we tackle data quality issues like accuracy, duplication and currency?

B. With no event level data, there is no direct opportunity to receive revenue from bookings, so there is potentially less financial incentive for some organisations in the data ecosystem to handle this additional data transfer. However, we feel there is potential value to be found elsewhere in providing an easier entry to the world of open data, to support upskilling and increased maturity in data and digital in the sector, and to simplify the move to event level data and booking revenue as the sector matures and the market grows.

@nickevansuk
Copy link
Contributor

Referencing previous proposal #295 on the same subject

@nickevansuk
Copy link
Contributor

nickevansuk commented Oct 19, 2023

This proposal is great to see, as is the London Sport work that's happening to support this.

My understanding is that club data is helpful to end users, especially in sports which are club-based, if it is sustainably accurate.

Previous efforts to share secondary data within OpenActive have not been successful. See these feeds for examples (noting that these feeds are not currently in use, due to the quality of the data within each, which degraded over time):

In all cases the ambition was to "get open data published and data users will come". Due to the issues with data quality and accuracy, data users didn't come, which led the feeds to degrade.

Some considerations:

  1. Primary vs. secondary sources:
    • OpenActive is principally concerned with data from primary sources - i.e. data from the provider's own systems.
    • This is in contrast to many activity finders that existed before OpenActive that used surveys or self-service means to collect data - resulting in low quality data that degrades in accuracy over time.
    • Such secondary-source products contributed to low end-user trust around aggregated data in our sector. If a user finds a hotel on Kayak or Booking.com, they don't doubt that it exists. In our sector, there's historically been a much lower degree of trust around online data relating to activities.
    • OpenActive tries to fix this with accurate data from primary sources, in order to increase trust.
    • OpenActive uses data that's as close as possible to the operations of the activity provider - i.e. their own booking system, or another system that they have an existing operational need to keep up-to-date. These booking systems are often in active use by the provider, which makes them intrinsically more sustainable.
    • It is easy to quickly publish secondary data, and more difficult to publish primary data. However, OpenActive has demonstrated that data feeds from primary sources are more sustainable and have more value to data users and end-users.
  2. Value of the data:
    • While it is highly likely that data users see value in primary-source club data, how much value would they see in secondary-source (and therefore low/degrading quality) club data? (e.g. would there be concerns around having their brand associated with such potentially out-of-date data?)
    • Might the inclusion of this type of data mixed with near-real-time OpenActive data dilute user trust in such brands?
  3. Data publishing approach: Feed vs CSV vs Structured data:
    • What is the most sustainable approach to publishing this data, such that it remains accurate?
      • RPDE Feeds are effective when publishing data from primary sources
      • CSVs are useful for sharing point-in-time data snapshots
      • Embedded structured data is useful for making websites the source of open data
  4. Modelling approach:
    • Each of the existing data types currently has a semantic meaning (https://developer.openactive.io/data-model/data-model-overview).
    • What is the most consistent way to model club data in keeping with OpenActive.io and Schema.org.
    • It is important to focus on semantics as this provides a strong foundation on which the specification can evolve - the right semantic decisions make adding further properties trivial.
    • It is worth noting that although seductive as a quick-and-dirty approach, publishing different types of data within existing semantics is often unhelpful - as it is likely unusable by existing data users and hard to differentiate.
  5. Digital transformation
    • Some clubs in our sector do not have websites, and therefore are unlikely to have a booking system. There is a multi-stage challenge with publishing data to OpenActive where the level of digital literacy is so low.
    • The wider strategy for sector-wide digital transformation needs to be considered here, rather than "publishing to OpenActive" or "getting data open" being the end goal in itself.
    • For example: it might be reasonable to require a club to have their own website in order to list on OpenActive - so that the end-user has confidence in the accuracy of the data. Efforts to increase digital literacy can therefore start with a website, and move on to OpenActive after the website is in place.
    • Being realistic about the pace of change of our sector is a key learning of OpenActive - better to publish less data that is sustainably accurate than have more data that will degrade over time.
    • It might also be helpful to give e.g. NGBs the tools to find where websites have fallen into disrepair, so that they can offer resources, recommendations, or guidance to resolve this.

Following these considerations, another approach to consider is a combination of the following:

  • Publishing a catalogue of club website URLs (from e.g. NGBs and APs; or from sport website systems) via simple JSON file
  • Asking clubs to include structured data markup (i.e. the Google-backed standard of embedded JSON-LD) in their existing website (and ask them to get a website if they don't already have one)

Advantages of this approach:

  • Globally, it's the most popular approach to sharing structured data in an open format - it's in use by over 10 million web sites.
  • Website URLs can be easily and reliably de-duplicated
  • Website URLs can also be used to link this data to booking system published data
  • It is as close to "primary data" as we can likely get for this data type
  • Embedded structured data will also improve SEO for the club, as it's the same type of data that powers Google's knowledge graph. Such data will also therefore be available to Siri, Alexa, Google Assistant, and OpenAI.
  • Easy to validate (OpenActive validator, schema.org validator, and Google's own validator already support this)
  • Easy to support with tooling - we could create an OpenActive version of the many tools that already exist to do this for websites.
  • Easy to consume - the consumption logic is the same as is already used to spider OpenActive dataset sites
  • There are many online tutorials for different technical platforms to provide guidance that already exist.
  • Requires the club to take action, which ensures the data is from source.
  • The club only needs to do this once, and can update it when they update their website.
  • Technically very simple to implement and cheap to adopt.
  • Lowers the barrier to entry of OA, while maintaining the data quality.
  • Is sustainable from an infrastructure perspective.
  • We can easily report on URLs that are no longer available or that no longer include structured data, in order to help catalogue maintainers (e.g. NGBs) to know who to reach out to.
  • Can be easily adopted by existing club website platforms e.g. https://myclubpro.co.uk/ https://mysportsclub.co.uk/ https://www.sportmember.co.uk/ etc.
  • Such platforms could be promoted by NGBs to easily help clubs get a website, which drives digital transformation within the sector overall (and promotion by NGBs acts as an incentive for adoption)
  • Encouraging clubs to have decent websites also makes them more visible via Google etc without relying on only OpenActive finders - so it helps improve user journeys for potential participants across multiple platforms, which broadens the reach of the clubs overall.
  • This approach also increases the chances of a good user experience, as all club data will have a website which the end-user can click-through to (though doesn't guarantee the quality of the website).
  • NGBs and APs could easily get a dashboard of how many of the clubs in their remit have already adopted (by just using the list of club website URLs they already have), which helps them to influence and support as appropriate.
  • Data can be easily read by e.g. a listing system like Open Sessions when the URL is entered, to avoid double-entry.

Disadvantages:

  • Requires a one-off technical action from each club by whoever looks after their website, which is more friction than a survey.
  • Might discourage clubs from publishing more detailed session data via e.g. Open Sessions (though as above, they are complementary)
  • Only works if the data complexity is low (i.e. number of fields is relatively small)

Summary:

  • It is easy to quickly publish secondary data, and more difficult to publish primary data.
  • However, as OpenActive has demonstrated, data from primary sources is more sustainable and has more value to data users and end-users.
  • This is an opportunity for OpenActive to be a catalyst for digital transformation.

It's also worth noting that the London Sport work to "survey organisations to capture the minimum / required fields to create an ‘open club standard’" is a useful prerequisite to all of this, as before data can be decentralised sustainably it must first be proven to be useful centrally.

Suggest it may be worth waiting for London Sport's survey work to conclude before we continue the conversation around the above. Until we get consensus on the fields in scope, it's difficult to have a useful conversation around the mechanism of data transfer or the data model. A high volume of varied data from a recent club survey is a fantastic resource to support these conversations.

If London Sport have not already considered engaging the team at https://everybodymoves.org.uk/, who have experience with this for their specific audience, that would likely be worth doing.

@citizenfish
Copy link

Very interesting analysis there Nick I'd add a few things to your proposal, which makes huge sense to me.

a) many clubs consider their Facebook page to be their website, some pedantry probably required to define "website"

b) JSON-LD is way beyond the vast majority of website mangers, most of whom are frantically fighting Wordpress with limited technical skills. My view is this needs to be completely obfuscated from them via a simple toolset that aids the creation and testing of it.

c) I'll probably phrase this badly but I wondered if the club hasn't got a website but has got a club finder entry, then in effect they have got a website as there is an online electronic record pointing to their details, my poorly formed idea is that in the absence of a url a placeholder would format their club finder record as a simple HTML page.

d) I see where you are coming from with the phrase "encouraging clubs to have decent websites" but we need to recognise that in the main these are not created or maintained by IT experts. They are often cobbled together by enthusiasts frantically searching the internet for the easiest and best looking option. This got me wondering whether their is value in an Openactive variant of Google Search Console which tests your website for "fitness for Openactive publishing". I'd love to build this.

e) my personal view is that everything possible should be done to get the least technically aware clubs listed as these are the missing entries in the online world and currently the social networks/village noticeboards own this data.

In summary I like your proposal for publishing a catalogue of URLS, as it simplifies discovery for me. The barrier to consuming this data is low and the work to get a standard in place should be relatively straightforward and hence quick.

@howaskew
Copy link
Author

Thanks for the contributions so far on this. Starting with @nickevansuk's points...

Previous efforts: The data visualiser tool gives us a convenient view on to the quality of feeds and we'll start to follow up with England Squash and British Orienteering as resource allows. Table Tennis 365 doesn't have a live dataset site, and hasn't for some time, so I'm proposing to remove it from the catalogue.

Ambitions: I think the ambition has evolved a little beyond "get open data published and data users will come". We know there are data users. This is about helping organisations progress along the journey towards digital transformation.

Primary vs secondary data: To clarify in this case:
Primary club data would be data provided and maintained directly by an individual club.
Secondary club data would be lists of clubs provided by NGBs, local authorities, other sources.

Value and Trust: This is related to the quality and currency of the data, rather than the primary or secondary status. I agree that in general primary / operational data should be better maintained. But it could be that some NGBs provide well maintained, accurate lists of clubs. Can we base the inclusion on quality (including maintenance) rather than source?

More generally in terms of trust in OpenActive data, I think we are already in a position where there is a wide range of quality and accuracy and currency of data in the OA feeds. Effectively, the London Sport work could act as a pilot where we can explore any impacts from including club data.

Data Publishing Approach: I think a consideration here is not just accuracy, but usability. Presenting club data in a format that can be read in alongside session data seems sensible. @Reikyo has prototyped a simple tool to transform a table / spreadsheet list into a 'compliant' RPDE feed. This would mean combining the club data with session data in existing tools is trivial. (Data users - please correct me if I am oversimplifying this.)

Modelling: Fully agree on the semantics being key, and on the principle of differentiation.

Digital Transformation: Agree it is critical to learn from the past, but initiatives like London Sport's current project are happening and we need to make the most of such opportunities, share our learning and try to manage challenges and risks as best we can. We are contributing to wider data and digital strategies in Sport where we can.

NIck's Proposal which we can call 'Option 6': Clubs add standardised information as json-ld in their websites, and the website urls are listed in a json catalog somewhere. Nick makes many valid points about improving SEO and nudging towards digital transformation. However, it seems that this approach falls one step short of actually presenting club data in a format that allows it to be presented in existing RPDE driven OpenActive tools.

@citizenfish's enhancements - a tool to obfuscate the json-ld creation, a tool to create a holding page from json-ld content, a tool to validate websites in terms of OpenActive - are great but they don't tackle that.

So, Option 7: Alongside this approach, OpenActive could host a version of Darren's tool to supply the extracted json-ld club data as an RPDE feed, that users can choose to incorporate with existing tools with little or no extra effort. This could bring new club data into, for example, Active Partnerships existing activity finder tools.

@howaskew
Copy link
Author

As Nick suggested, seeing the actual data fields being considered would be helpful in this discussion.
So, here is the latest draft from London Sport.

Club Name(instead of organisation name)
Session Location (Venue name, Street, Town/City, County, Postcode)
Session Location 1
Session Location 2
Email address
Activity Type (Multi-select drop down menu from OA activity list)
Club Website URL
Club Social URL
Contact Phone Number
Description
Beginner
Intermediate
Advanced
Minimum Age
Maximum Age
Gender (Open / Male / Female)
Is the club members only or open to all? (None / No, a membership is not required / Yes, a membership is required) Accessibility information

We have a meeting lined up early December. I'll take along relevant points from this discussion - primary / secondary, currency and trust, semantic model, etc. But obviously also interested in any further comments.

@Julz-YSF
Copy link

Julz-YSF commented Feb 5, 2024

_> As Nick suggested, seeing the actual data fields being considered would be helpful in this discussion. So, here is the latest draft from London Sport.

Club Name.._

Would suggest adding

AKA - lots of clubs will have sport acronyms ARLFC some will spell it out
campaign/Keyword - free back end fields than can include campaign tags or things like local authority areas etc
siteID - Sport England's Active places ref for mapping locations, it covers maybe 90% of venues
text - so clubs can add an intro or "sessions most Thursdays"
admin email - the public contact is not always the main admin contact

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants