Skip to content

Latest commit

 

History

History
80 lines (49 loc) · 2.94 KB

README.md

File metadata and controls

80 lines (49 loc) · 2.94 KB

urlcleaner Build Status codecov.io License

This library provides functions to normalize, unshorted, and extract URL.

urlcleaner API uses JDK 8.

URL normalization is the process by which URLs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URL into a normalized URL so it is possible to determine if two syntactically different URLs may be equivalent.

Getting Started

To use urlcleaner in your application, you have to add urlcleaner in your classpath. urlcleaner is available on Maven Central so you just need to add dependency to your favorite build tool as show below.

For Apache Maven users, please add following to your pom.xml.

<dependencies>
    <dependency>
        <groupId>com.shekhargulati.urlcleaner</groupId>
        <artifactId>urlcleaner</artifactId>
        <version>0.4.0</version>
        <type>jar</type>
    </dependency>
</dependencies>

Gradle users can add following to their build.gradle file.

compile(group: 'com.shekhargulati.urlcleaner', name: 'urlcleaner', version: '0.4.0', ext: 'jar')

URL Normalization Usage

import com.shekhargulati.urlcleaner.UrlCleaner;

UrlCleaner.normalizeUrl("shekhargulati.com") // http://shekhargulati.com

UrlCleaner.normalizeUrl("https://www.shekhargulati.com:443") // https://shekhargulati.com

UrlCleaner.normalizeUrl("www.shekhargulati.com") // http://shekhargulati.com

UrlCleaner.normalizeUrl("http://shekhargulati.com/%7Eabout/") // http://shekhargulati.com/~about

UrlCleaner.normalizeUrl("http://shekhargulati.com/hello%5Fabout/") // http://shekhargulati.com/hello_about

UrlCleaner.normalizeUrl("http://shekhargulati.com?lang=en&article=fred") // http://shekhargulati.com?article=fred&lang=en

UrlCleaner.normalizeUrl("http://xn--xample-hva.com") // http://êxample.com

Unshorten URL Usage

UrlCleaner.unshortenUrl("http://bit.ly/1Wtrl9t"); // http://shekhargulati.com/

// It can also work for multi level shortened URL. The below URL is shortened 4 times
UrlCleaner.unshortenUrl("http://bit.ly/1pwuGdF"); //http://www.bloomberg.com/news/articles/2016-03-17/unmasking-startup-l-jackson-silicon-valley-s-favorite-twitter-persona

URL Extraction

final String text = "CloudABI now available for Arch Linux https://nuxi.nl/doc/archlinux/  (cmts https://google.com )";
List<String> urls = UrlExtractor.extractUrls(text);  
// urls -> [https://nuxi.nl/doc/archlinux/,https://google.com]

License

urlcleaner is licensed under the MIT License - see the LICENSE file for details.