-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Import format reader #129
Labels
Comments
Current workaround: #!/usr/bin/env perl
use v5.14.1;
use Encode;
# Convert PICA+ export format to normalized PICA+ with valid UTF-8
# Input format:
# - each record starts with an empty line and a line with \x1D
# - each field is one line, started with \x1E
while (<>) {
chomp;
next unless $_; # ignore empty lines
if ( $_ eq "\x1D" ) {
say "" if $. > 2; # start of next record
}
else {
if ( $_ =~ /^\x1E[012][0-9][0-9][A-Z@]/ ) {
my $field = substr $_, 1;
# invalid UTF-8 => U+FFFD (Unicode REPLACEMENT CHARACTER)
my $bytes = encode( 'UTF-8', decode( 'UTF-8', $field ) );
if ( $field ne $bytes ) {
warn "$.: invalid UTF-8\n";
}
print $bytes, "\x1E";
}
elsif ( $. > 2 ) { # empty line after record
say "$.: '$_'\n";
}
}
}
# newline after last record
say ""; |
nichtich
added a commit
that referenced
this issue
Aug 9, 2023
nichtich
added a commit
that referenced
this issue
Aug 9, 2023
Implemented in release 2.10 (not released yet). |
nichtich
added a commit
that referenced
this issue
Aug 9, 2023
Changelog diff is: diff --git a/Changes b/Changes index 9d70e1b..c0520cc 100644 --- a/Changes +++ b/Changes @@ -1,6 +1,8 @@ Revision history for PICA::Data {{$NEXT}} + +2.10 2023-08-09T14:01:25Z - Add PICA Import format parser (#129) - Add parser counter (method: count)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In continuation of #128 add
PICA::Parser::Import
based on https://wiki-cbs.oclc.org/wiki/images/Software_for_Data_Import.pdf.The text was updated successfully, but these errors were encountered: