Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 

README.md

analyze-characters

Perl script that tallies up the count of ASCII, extended ASCII and UTF-8 characters in a file

This script was used to analyze the content of large (up to 1 GB) XML files representing product catalog data from various paid search clients.
These files would subsequently be processed into other formats to be entered into various paid search platforms such as Google AdWords. The process supported only plain 7-bit ASCII. Any use of alternate characters sets such as the extended ascii portions of Latin-1 or WinLatin-1, or UTF-8, would cause Google search results to contain 'junk' characters. This script would identify the line numbers on which such unacceptable characters occurred, allowing us to identify to the clients which products need correcting, or which we would correct ourselves.

About

Perl script that tallies up the count of ASCII, extended ASCII and UTF-8 characters in a file

Resources

Releases

No releases published

Languages

You can’t perform that action at this time.